Full Text Search Implementierung
Full-Text Search Implementierung
Full-Text Search ermöglicht schnelle Suche in großen Textmengen. Lernen Sie Elasticsearch, PostgreSQL Full-Text und Meilisearch für Ihre Anwendung einzusetzen.
Full-Text vs. LIKE
┌─────────────────────────────────────────────────────────────┐
│ LIKE vs FULL-TEXT SEARCH │
├─────────────────────────────────────────────────────────────┤
│ │
│ SQL LIKE: │
│ SELECT * FROM products WHERE name LIKE '%laptop%' │
│ │
│ ❌ Full Table Scan (langsam) │
│ ❌ Keine Relevanz-Sortierung │
│ ❌ Keine Fuzzy-Suche (Tippfehler) │
│ ❌ Keine Stemming (laufen ≠ läuft) │
│ ❌ Keine Synonyme │
│ │
│ FULL-TEXT SEARCH: │
│ SELECT * FROM products │
│ WHERE to_tsvector('german', name) @@ │
│ to_tsquery('german', 'laptop') │
│ │
│ ✅ Index-basiert (schnell) │
│ ✅ Relevanz-Ranking │
│ ✅ Stemming (Wortformen) │
│ ✅ Stopwords entfernen │
│ ✅ Gewichtung möglich │
│ │
└─────────────────────────────────────────────────────────────┘
PostgreSQL Full-Text Search
-- tsvector: Durchsuchbares Dokument
-- tsquery: Suchanfrage
-- Einfache Suche
SELECT title, content
FROM articles
WHERE to_tsvector('german', title || ' ' || content) @@
to_tsquery('german', 'datenbank & optimierung');
-- Dedicated tsvector Column für Performance
ALTER TABLE articles ADD COLUMN search_vector tsvector;
UPDATE articles SET search_vector =
setweight(to_tsvector('german', coalesce(title, '')), 'A') ||
setweight(to_tsvector('german', coalesce(content, '')), 'B');
-- GIN Index erstellen
CREATE INDEX articles_search_idx ON articles USING GIN(search_vector);
-- Trigger für automatische Updates
CREATE FUNCTION articles_search_update() RETURNS trigger AS $$
BEGIN
NEW.search_vector :=
setweight(to_tsvector('german', coalesce(NEW.title, '')), 'A') ||
setweight(to_tsvector('german', coalesce(NEW.content, '')), 'B');
RETURN NEW;
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER articles_search_trigger
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION articles_search_update();
-- Suche mit Ranking
SELECT
title,
ts_rank(search_vector, query) AS rank,
ts_headline('german', content, query, 'StartSel=<b>, StopSel=</b>') AS snippet
FROM articles, to_tsquery('german', 'postgresql & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
-- Phrase Search
SELECT * FROM articles
WHERE search_vector @@ phraseto_tsquery('german', 'full text search');
-- OR Suche
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('german', 'mysql | postgresql');
-- NOT Suche
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('german', 'database & !nosql');
-- Prefix-Suche (Autocomplete)
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('german', 'prog:*');
Elasticsearch
# Docker Compose
version: '3'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
ports:
- "9200:9200"
# Index erstellen mit Mapping
PUT /products
{
"settings": {
"analysis": {
"analyzer": {
"german_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "german_stop", "german_stemmer"]
}
},
"filter": {
"german_stop": {
"type": "stop",
"stopwords": "_german_"
},
"german_stemmer": {
"type": "stemmer",
"language": "german"
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "german_analyzer"
},
"description": {
"type": "text",
"analyzer": "german_analyzer"
},
"category": {
"type": "keyword"
},
"price": {
"type": "float"
}
}
}
}
# Dokument indexieren
POST /products/_doc
{
"name": "Gaming Laptop",
"description": "Leistungsstarker Laptop für Gaming und Arbeit",
"category": "Electronics",
"price": 999.99
}
# Einfache Suche
GET /products/_search
{
"query": {
"match": {
"name": "laptop"
}
}
}
# Multi-Field Suche
GET /products/_search
{
"query": {
"multi_match": {
"query": "gaming laptop",
"fields": ["name^2", "description"],
"type": "best_fields"
}
}
}
# Bool Query (komplexe Suche)
GET /products/_search
{
"query": {
"bool": {
"must": [
{ "match": { "name": "laptop" } }
],
"filter": [
{ "term": { "category": "Electronics" } },
{ "range": { "price": { "lte": 1500 } } }
],
"should": [
{ "match": { "description": "gaming" } }
]
}
},
"highlight": {
"fields": {
"name": {},
"description": {}
}
}
}
# Fuzzy Search (Tippfehler)
GET /products/_search
{
"query": {
"match": {
"name": {
"query": "lapotp",
"fuzziness": "AUTO"
}
}
}
}
# Autocomplete (Prefix)
GET /products/_search
{
"query": {
"match_phrase_prefix": {
"name": "gam"
}
}
}
// PHP Elasticsearch Client
use Elastic\Elasticsearch\ClientBuilder;
$client = ClientBuilder::create()
->setHosts(['localhost:9200'])
->build();
// Suche
$params = [
'index' => 'products',
'body' => [
'query' => [
'multi_match' => [
'query' => $searchTerm,
'fields' => ['name^2', 'description'],
'fuzziness' => 'AUTO'
]
],
'highlight' => [
'fields' => [
'name' => new \stdClass(),
'description' => new \stdClass()
]
]
]
];
$response = $client->search($params);
foreach ($response['hits']['hits'] as $hit) {
echo $hit['_source']['name'];
echo $hit['highlight']['name'][0] ?? '';
}
Meilisearch (Einfache Alternative)
# Docker
docker run -d -p 7700:7700 \
-v $(pwd)/meili_data:/meili_data \
getmeili/meilisearch:latest
# Index erstellen und Dokumente hinzufügen
curl -X POST 'http://localhost:7700/indexes/products/documents' \
-H 'Content-Type: application/json' \
--data-binary '[
{"id": 1, "name": "Gaming Laptop", "price": 999},
{"id": 2, "name": "Office Laptop", "price": 599}
]'
# Suche
curl 'http://localhost:7700/indexes/products/search' \
-H 'Content-Type: application/json' \
--data-binary '{"q": "laptop"}'
# Suchbare Attribute konfigurieren
curl -X PATCH 'http://localhost:7700/indexes/products/settings' \
-H 'Content-Type: application/json' \
--data-binary '{
"searchableAttributes": ["name", "description"],
"filterableAttributes": ["category", "price"],
"sortableAttributes": ["price"]
}'
# Filter und Sortierung
curl 'http://localhost:7700/indexes/products/search' \
-H 'Content-Type: application/json' \
--data-binary '{
"q": "laptop",
"filter": "price < 1000 AND category = Electronics",
"sort": ["price:asc"]
}'
// JavaScript Client
import { MeiliSearch } from 'meilisearch';
const client = new MeiliSearch({ host: 'http://localhost:7700' });
// Index und Dokumente
const index = client.index('products');
await index.addDocuments([
{ id: 1, name: 'Gaming Laptop', price: 999 },
{ id: 2, name: 'Office Laptop', price: 599 }
]);
// Suche
const results = await index.search('laptop', {
filter: 'price < 1000',
sort: ['price:asc'],
limit: 20
});
console.log(results.hits);
Algolia (SaaS)
// JavaScript Client
import algoliasearch from 'algoliasearch';
const client = algoliasearch('APP_ID', 'API_KEY');
const index = client.initIndex('products');
// Dokumente indexieren
await index.saveObjects([
{ objectID: '1', name: 'Gaming Laptop', price: 999 },
{ objectID: '2', name: 'Office Laptop', price: 599 }
]);
// Suche
const { hits } = await index.search('laptop', {
filters: 'price < 1000',
hitsPerPage: 20,
attributesToHighlight: ['name']
});
// InstantSearch UI (React)
import { InstantSearch, SearchBox, Hits } from 'react-instantsearch-dom';
function Search() {
return (
<InstantSearch indexName="products" searchClient={client}>
<SearchBox />
<Hits hitComponent={Hit} />
</InstantSearch>
);
}
Vergleich
| Lösung | Vorteile | Nachteile |
|---|---|---|
| PostgreSQL | Keine Extra-Infrastruktur, ACID | Weniger Features, Performance bei Scale |
| Elasticsearch | Mächtig, skalierbar, Analytics | Komplex, hoher Ressourcenverbrauch |
| Meilisearch | Einfach, schnell, typo-tolerant | Weniger Features als ES |
| Algolia | Managed, sehr schnell, UI-Widgets | Kosten, Lock-in |
Autocomplete implementieren
// Backend Endpoint
app.get('/api/search/suggest', async (req, res) => {
const { q } = req.query;
if (!q || q.length < 2) {
return res.json([]);
}
// Meilisearch
const results = await index.search(q, {
limit: 5,
attributesToRetrieve: ['name']
});
res.json(results.hits.map(h => h.name));
});
// Frontend (mit Debounce)
const [query, setQuery] = useState('');
const [suggestions, setSuggestions] = useState([]);
const debouncedSearch = useMemo(
() => debounce(async (q) => {
if (q.length < 2) {
setSuggestions([]);
return;
}
const response = await fetch(`/api/search/suggest?q=${q}`);
const data = await response.json();
setSuggestions(data);
}, 300),
[]
);
useEffect(() => {
debouncedSearch(query);
}, [query]);
return (
<div>
<input value={query} onChange={e => setQuery(e.target.value)} />
<ul>
{suggestions.map(s => <li key={s}>{s}</li>)}
</ul>
</div>
);
💡 Empfehlungen:
1. Klein: PostgreSQL Full-Text reicht oft
2. Mittel: Meilisearch für einfache Integration
3. Groß: Elasticsearch für komplexe Anforderungen
4. Enterprise: Algolia wenn Budget vorhanden
5. Immer: Debounce für Autocomplete, Highlighting für UX
2. Mittel: Meilisearch für einfache Integration
3. Groß: Elasticsearch für komplexe Anforderungen
4. Enterprise: Algolia wenn Budget vorhanden
5. Immer: Debounce für Autocomplete, Highlighting für UX