Skip to main content

Methodology

Who built this

I'm Pekka Setälä, a solo developer in Finland and a hobbyist gongfu tea drinker. I built gongfucha.app by myself, on nights and weekends, because the tools I wanted didn't exist the way I wanted them. More about me at pekkasetala.github.io and github.com/PekkaSetala.

The corpus

The app is backed by 84 hand-curated tea entries: greens, whites, yellows, oolongs, red (black) teas, sheng and shou puerh, and hei cha. Each entry carries origin, terroir, cultivar, processing, oxidation and roast level, flavor profile, firsthand tasting notes where I have them, and a full gongfu brewing schedule (temperature, leaf ratio, per-infusion times, rinse behavior). The corpus is released under CC-BY-4.0 at github.com/PekkaSetala/gongfucha-corpus. If that link 404s, the repo is still being separated from the app repo — check back shortly.

How retrieval works

When you describe a tea in free text, the app embeds your query with a local all-MiniLM-L6-v2 model, runs a hybrid search against Qdrant (name and alias boost plus cosine similarity over the embedded entries), and returns the closest match above a confidence threshold. If nothing clears the threshold, it falls back to an LLM call through OpenRouter, grounded in the top retrieved entries.

There is no LangChain or LlamaIndex here. The pipeline is built from primitives — embedding, vector store, retrieval, prompt augmentation — because this is both a portfolio project and a practical tool, and the honest answer is that both matter. I wanted to understand the mechanics, not wire together a framework.

Sources

Entries cite their sources. Most of the information is drawn from vendor catalogs and tea-writing sites that I read regularly: Verdant Tea, Yunnan Sourcing, Seven Cups, White2Tea, Path of Cha, TeaDB, and Mei Leaf. Where I've drunk the tea myself, the tasting notes are mine and the vendor copy is cross-checked, not quoted.

Updates

The corpus updates occasionally, as sources improve or as I drink and learn more. There is no schedule. If something is wrong, it gets fixed when I notice or when someone tells me.

Corrections

Found a factual error, a mistranslated cultivar name, a brewing schedule that's plainly wrong? Open an issue on the corpus repo at github.com/PekkaSetala/gongfucha-corpus/issues. There is no contact form on the site on purpose.

Most tea sites are vendors or aggregators. This one is a hobbyist writing down what the tea actually tastes like.