# Methodology & limitations

_Last updated: 2026-06-26_

This database documents Palestinian prisoners and detainees held in Israeli
custody. It is built for **traceability**: we record *claims made by sources*,
not free-floating "facts." This page explains where the data comes from, how it
is processed, and — just as importantly — what it does **not** tell you.

## What's in here

The current release covers Palestinians who **died in Israeli custody**, seeded
from B'Tselem's published fatalities record. The schema is built to scale to
living detainees later; that data is gated and not yet published.

Each person record may carry: name(s) and spellings, age, gender, place of
residence, occupation, detention/legal status, date and place of death, whether
the body is being withheld, and a narrative paragraph from the source.

## Sources & provenance

- Every published record traces to at least one **source** (a citation: a
  B'Tselem case page, an organization statement, a report, a news item, or a
  Telegram post). Sources are stored with their URL, publication date, and —
  where possible — an archived copy of the original text.
- A record's `verification` field reflects corroboration, not truth:
  - `single_source` — asserted by one source.
  - `corroborated` — independently asserted by two or more sources.
  - `disputed` — sources conflict.
  - `unverified` — not yet assessed.
- The seed data is currently `single_source` (B'Tselem). Corroboration levels
  will rise as we ingest independent datasets (Palestinian Prisoner Committee,
  HaMoked, PHRI, Addameer).

## Human-gated publishing

Nothing appears in the public API or the dumps until a human reviewer marks it
`published`. This is enforced at the database level by row-level security: the
public (anon) key can read only rows where `is_published` is true. Imported and
AI-drafted records land unpublished — a review queue — first. Members edit and
publish through the contributor admin.

## How the raw data is processed

The import pipeline is deterministic and re-runnable, and every editorial
decision it makes is logged to a **data-quality report**
(`data/reports/quality-report.md`). Specifically:

- **Names.** Source spreadsheets often list several transliterations in one cell
  ("A / B"). We split these into a primary name plus aliases; all are searchable.
  We do **not** merge two people just because they share a name — e.g. two
  distinct men named "Atta Youssef Hassan Fayyad" are kept as separate records.
- **Facilities.** The same prison appears under many names (Negev = Ketziot =
  al-Naqab; Ayalon = Ramla; Damon = Damum). We cluster these into one canonical
  facility with the variants recorded as aliases. Each clustering decision is in
  the quality report for review.
- **Dates.** Parsed to ISO `YYYY-MM-DD`. Where a source gives only a month, or
  the date is unrecoverable, the field is left null rather than guessed.
- **Status & body-withholding** are inferred from the source narrative and
  should be treated as derived, not authoritative.
- **Generic locations.** Strings like "Prison inside Israel" name no specific
  facility, so the facility link is intentionally left null.

## Limitations — please read

- **This is not a complete count.** It reflects only what documented sources have
  reported. Absence from this database does not mean a person was not detained or
  did not die.
- **Reported figures lag reality.** Many deaths are announced months later;
  dates of death and arrest are frequently approximate ("about three months
  after his arrest").
- **Causes of death are usually unknown** to the source and are not asserted here
  unless a source states them.
- **Transliteration is lossy.** English spellings vary; always check Arabic
  names and aliases where available.
- **Derived fields** (status, region, body-withholding) are best-effort
  inferences from narrative text and may be wrong. The narrative is authoritative.

## Corrections

Found an error? Corrections are welcomed and reviewed. (A public "report an
error" form is planned; until then, contact the maintainers.) Members make
corrections through the contributor admin; records can be corrected, unpublished,
or removed. A detailed change-audit log is a planned enhancement.

## Licensing

Source data is reused only under terms that permit it; absent written reuse
terms, sources are linked rather than redistributed. Citation is not a
redistribution license. See each source's terms.
