Research project
A documentation of Sanye (Dahalo), a critically endangered Cushitic language of Kenya
This project creates a comprehensive audiovisual documentation of the Sanye language, the sociolinguistic situation, and cultural practices of the Sanye (Dahalo) community in coastal Kenya.
- Duration
- 2025 - 2026
- Contact
- Ahmed Sosal Altayeb Mohammed Ali
- Funding
- Endangered Languages Documentation Programme (ELDP), Small Grant
Sanye (ISO 639-3: dal, also known as Dahalo) is a critically endangered Cushitic language spoken by fewer than 400 people in small, dispersed communities along Kenya's coast (Lamu and Tana River Counties). Despite its debated position within Cushitic (with an uncertain classification) and its importance for comparative studies, Sanye remains severely under-documented. There is a grammatical sketch (Tosco, 1991), a phonetic study (Maddieson et al., 1993), and scattered wordlists. There is no comprehensive audiovisual corpus of naturalistic speech, and the language is threatened because the intergenerational transmission has almost failed, with the youngest fluent speakers now in their early twenties.
This project addresses this critical documentation gap through intensive fieldwork conducted in August–September 2025 in seven Sanye villages (Mwankanda, Shekale, Bahati Njema, D'a'i, Kipini, Kwa Hanago, and Ngowi). In close collaboration with community coordinators and with full informed consent, we recorded 53 sessions with 124 participants, producing 20 hours of video (14 hours core documentation + 6 hours sociolinguistic interviews), 19.5 hours of audio (12 hours core documentation + 7.5 hours sociolinguistic interviews), 1 hour of oral annotations (phrase translations in Swahili and careful speech), and 240 photographs. Session types include conversations on daily life and subsistence, oral narratives (trickster tales, personal histories, clan traditions), songs, dramatized cultural performances, ethnographic discussions (clan systems, material culture), and seven comprehensive sociolinguistic interviews documenting language vitality, attitudes, transmission patterns, and identity negotiations.
A distinctive methodological feature is the corpus's bilingual nature: many recordings include Swahili explanations by Sanye (passive) speakers, who translate and elaborate on the Sanye speech for younger, Swahili-dominant community members. The Sanye speech, accompanied by metalinguistic Swahili commentary, offers sociolinguistic insights into the community's awareness of language loss, their strategies for knowledge transmission across generations, and the realities of navigating linguistic shift while maintaining ethnic identity.
The 124 participants include diverse ages (from elders born ca. 1935 to young people born 2005–2007), competence levels (fluent speakers, partial speakers, passive understanders, non-speakers with Sanye identity), and clan affiliations (Walunku, Wamanta, ɦebalawa, ʔilaane, Noʔolawa, Suntumin, Digiʔima). The sociolinguistic interviews reveal complex attitudes such as the strong preference for the self-designation "Sanye" over stigmatizing exonyms "Dahalo", yet coexisting pride and shame shaped by decades of marginalization.
The corpus is deposited in the Endangered Languages Archive (ELAR) with comprehensive metadata. The collection serves multiple purposes: linguistic analysis, comparative Cushitic research, sociolinguistic study of language shift and identity, ethnographic documentation of marginalized hunter-gatherer communities, and as a resource for potential revitalization efforts.
This project contributes to urgent global efforts to document endangered linguistic diversity before it is irretrievably lost, while respecting community priorities and ethical standards in collaborative documentation.