MyAlamat.myTuxGeo+
Sovereign Malaysian Address Engine

Malaysian Address Geocoding
& Standardization

Standardize, parse, and geocode complex Malaysian addresses conforming to the MS 2039:2006 standard. Converted through state-of-the-art tokenization and AI matching models.

Try Interactive DemoView API Documentation
Core Technology

Engine Features

Powering government agencies and commercial enterprises with precise address validation and geospatial lookups.

Address Normalization

BM/EN bilingual parsing and component extraction (street, building, unit, section, mukim, daerah, negeri) using custom-trained ML models.

Multi-Strategy Matching

Flexible fallback strategies leveraging postcode search, cadastral lot/UPI mapping, and secure Google Maps API lookups when needed.

Quality Score Engine

Returns granular accuracy flags (from 0FOUND indicating a perfect match to 9NOTFOUND) to rate database health.

Bilingual & Sovereign

Fully local on-premise or sovereign cloud deployment, support for BM/EN taxonomy, and full compliance with Malaysian addressing standard MS 2039:2006.

Under the Hood

How MyAlamat Works

From a messy line of text to a verified rooftop coordinate: a national-scale address engine built on linguistic parsing, in-memory word vectors, and a conflated PostGIS corpus.

01 / 05

Ingest & Conflate

11+ national datasets fused into one canonical corpus

  • Property listings, TNB & IWK utility records, HERE & TomTom basemaps, cadastral Lot/UPI references, OSM & DDSA boundaries
  • Every record is normalized into 16 standard components and fingerprinted with an address hash for exact deduplication

One address, many witnesses

No single dataset describes a Malaysian address completely. MyAlamat treats each source as a witness: utility billing records prove occupancy, cadastral lot numbers anchor legal parcels, commercial basemaps contribute building footprints, and open boundaries frame every mukim, daerah and negeri. Records are normalized into 16 standard components, fingerprinted, deduplicated, and merged into a single canonical PostGIS corpus.

Reverse geocoding resolves any coordinate to its administrative home across 16 negeri & wilayah persekutuan, 133 daerah and 2,052 mukim boundary polygons.

  • Property listings
  • TNB · IWK utilities
  • HERE · TomTom basemaps
  • Cadastral Lot / UPI
  • OSM · DDSA boundaries
  • Canonical corpus — PostGIS
02 / 05

Parse & Tokenize

Custom-trained libpostal model for Malaysian addresses

  • 28 token types — unit, level, building, street, seksyen, taman, mukim, daerah, negeri and more
  • Bilingual BM/EN with synonym expansion (JLN → JALAN, BLK → BLOCK)
Tokenizer replay — 28 token types, custom-trained for Malaysian addresses

Level 30,levelMenara Maxis,buildingKLCC,section50088postcodeKuala Lumpurcity

03 / 05

Vectorize & Match

AddressVec — the corpus held in RAM as int64 word vectors

  • Every address word becomes a compact integer vector; lookups run entirely in memory
  • A cascade of strategies runs from strictest (postcode + street + state + house) down through progressively looser fallbacks
AddressVec — every word becomes an int64 vector, matched entirely in RAM
Level 300xa50a7bc4fda80835
Menara Maxis0xed0f5cc14c0e07f8
KLCC0xfb8da8602bf5b059
500880x1c1f866060a4bc53
Kuala Lumpur0x9cc80200c9e0306d
in-memory index — no database round-trip per lookup
04 / 05

Locate & Grade

PostgreSQL 18 + PostGIS 3.6 with H3 spatial hashing

  • Coordinates resolved against the canonical corpus; reverse geocoding via mukim, daerah and negeri polygons
  • Every result carries a transparent quality tier from 0FOUND to 9NOTFOUND — no silent guesses
0FOUNDPostcode + street + state + house — exact rooftop match
1–3FOUNDPostcode anchored to a street or seksyen
4–5FOUNDStreet + seksyen matched without a postcode
6–7FOUNDAnchored to a named building
8REMOVESECTIONRecovered by retrying without the section token
GMAPSResolved via external fallback geocoder
9NOTFOUNDHonestly flagged as unresolved — never silently guessed
05 / 05

Serve & Audit

Sovereign REST API with a full audit trail

  • POST /address/v1/geocode, /correct and /match — an OpenAPI 3.1 contract behind NGINX
  • ClickHouse audit logging; ships as a single Docker Compose reference stack for on-premise installs
1.37M
addresses geocoded in 39 seconds
national IWK batch — 38 min naive, ~59× faster with AddressVec
16
structured components per address
unit to negeri, hashed for deduplication
11+
national datasets conflated
utilities, cadastre, basemaps, open data
2,052
mukim boundary polygons
plus 133 daerah and 16 negeri & wilayah persekutuan

Where it's heading

Shipped

Public REST API

OpenAPI 3.1 contract — /geocode, /correct and /match endpoints behind NGINX with ClickHouse audit logging.

Shipped

Bilingual BM/EN output

Canonical negeri and component labels in both Bahasa Malaysia and English.

Shipped

Sovereign reference deployment

Reproducible single-host Docker Compose stack — PostgreSQL + PostGIS, API, NGINX, ClickHouse — for on-premise and air-gapped installs.

In progress

Compliance benchmark suite

Continuous precision/recall and latency (p50/p95/p99) benchmarking against curated ground truth.

Planned

Cadastral & basemap refresh

Reimport of TomTom SmartMap building data and the national Lot/UPI dataset.

Planned

ML confidence & active learning

Model-scored match confidence and a feedback loop that learns from user-submitted corrections.

Live Sandbox

Address Parsing Demo

Enter any messy, unstructured Malaysian address to see the tokenization engine split it into standard components.

Input Address

Click sample presets:

Output Metadata

Submit an address to view standard tokens, coordinates, and precision scores.

For Engineers

API Integration

Integrate Malaysian address parsing directly into your ERP, CRM, checkout pages, or data warehouse pipelines with lightweight REST APIs.

  • Developer Sandbox

    Get test API keys with 10k free monthly geocoding requests immediately.

  • Production SLAs

    Sub-50ms API latencies, high-availability multi-region setups, and sovereign limits.

  • Malaysian Standards Compliant

    Outputs fields explicitly mapped to MS 2039:2006 specifications.

# Post a messy address for geocoding
curl -X POST "https://maps.tuxgeo.dev/address/v1/geocode" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "address": "Level 30, Menara Maxis, KLCC, 50088 Kuala Lumpur"
  }'

Request Sovereign Integration

Need on-premise deployment or custom SLAs? Drop us a message, and our team will get back to you shortly.