Skip to content

helakit

helakit

Helakit — a toolkit for validating and working with Sri Lankan data.

HelakitError

Bases: Exception

Base class for every exception raised by helakit.

InvalidInputError

Bases: HelakitError

Raised when an input is the wrong type or otherwise unusable.

This signals a programmer error, not a validation failure. A malformed NIC string returns a :class:ValidationResult with is_valid=False; passing None instead of a string raises this.

NICBatchResult dataclass

NICBatchResult(results: list[NicResult], duplicates: dict[str, list[int]] = dict(), summary: NICSummary = (lambda: NICSummary(0, 0, 0, 0, 0, 0, 0))(), df: Any | None = None)

The outcome of validating a list / dataframe of NICs.

Attributes:

Name Type Description
results list[NicResult]

One :class:NicResult per input row, in the same order as the input.

duplicates dict[str, list[int]]

Mapping from canonical NIC (uppercased, V/X stripped) to the row indices it appeared at. Only entries with two or more indices are included.

summary NICSummary

Roll-up counts.

df Any | None

When the input was a pandas or polars DataFrame, this is a copy of that frame with helakit's per-row columns appended. None when the input was a list.

NICDecoded dataclass

NICDecoded(format: Literal['old', 'new'], dob: date, gender: Literal['male', 'female'], age: int, year: int, day_code: int, serial: int, check_digit: int, voting_eligible: bool | None)

Structured fields extracted from a valid NIC.

Attributes:

Name Type Description
format Literal['old', 'new']

Either "old" (9 digits + V/X) or "new" (12 digits).

dob date

Decoded date of birth.

gender Literal['male', 'female']

Either "male" or "female", derived from the day-of-year encoding.

age int

Age in completed years at the time of validation.

year int

Full birth year.

day_code int

Day-of-year encoding with the female 500 offset removed.

serial int

Serial number assigned on the registration day.

check_digit int

Final check digit (not currently verified — see :mod:helakit.nic._parse).

voting_eligible bool | None

True if the old NIC ends in V, False if X, None for new-format NICs.

NICError

Bases: HelakitError

Base class for NIC-related errors raised by helakit.

NICFormatError

Bases: NICError

Raised when an input cannot be parsed as either NIC format.

Used by :func:helakit.nic.convert_nic because conversion has no sensible ValidationResult to return — failure must propagate as an exception.

NICSummary dataclass

NICSummary(total: int, valid: int, invalid: int, duplicate_groups: int, duplicate_rows: int, dob_mismatches: int, gender_mismatches: int)

Aggregate counts for a batch validation run.

Attributes:

Name Type Description
total int

Number of input rows processed.

valid int

Rows whose NIC passed structural validation.

invalid int

Rows that failed structural validation.

duplicate_groups int

Number of distinct NICs that appear in more than one row (after canonicalisation that strips V/X suffixes).

duplicate_rows int

Total rows participating in any duplicate group.

dob_mismatches int

Rows where the supplied DOB differed from the NIC-decoded DOB (only counted when both were available).

gender_mismatches int

Same idea for gender.

NicResult dataclass

NicResult(is_valid: bool, value: str, normalized: str | None = None, errors: list[ValidationError] = list(), data: dict[str, Any] = dict())

Bases: ValidationResult

Validation result returned by :func:~helakit.nic.validate_nic.

Adds typed property accessors for every field the NIC validator extracts, including pass-through accessors that reach into :class:NICDecoded so the most commonly-used fields are one attribute away::

result.decoded.dob   # works
result.dob           # also works — same value

Properties return None on invalid results so attribute access never raises — guard with if result: before reading.

age property

age: int | None

Age in completed years at validation time. None if invalid.

decoded property

decoded: NICDecoded | None

Full :class:NICDecoded payload. None if invalid.

dob property

dob: date | None

Decoded date of birth. None if invalid.

dob_match property

dob_match: bool | None

True / False if a DOB was cross-checked; None otherwise.

format property

format: Literal['old', 'new'] | None

"old" or "new". None if invalid.

gender property

gender: Literal['male', 'female'] | None

"male" or "female". None if invalid.

gender_match property

gender_match: bool | None

True / False if a gender was cross-checked; None otherwise.

mismatch_detail property

mismatch_detail: str | None

Human-readable diff of cross-check vs decoded. None when no cross-check ran or everything matched.

mismatch_reasons property

mismatch_reasons: list[str] | None

Which cross-check fields disagreed with the NIC. None when no cross-check ran.

serial property

serial: int | None

Serial number assigned on the registration day. None if invalid.

voting_eligible property

voting_eligible: bool | None

True / False for old NICs; None for new NICs or invalid results.

year property

year: int | None

Full birth year. None if invalid.

PhoneDecoded dataclass

PhoneDecoded(carrier: str, line_type: LineType, local: str)

Structured metadata about a recognised Sri Lankan phone number.

Returned in PhoneResult.data["decoded"] and also accessible as PhoneResult.decoded. Bundles the three pieces of information that can be derived from a number's prefix.

Attributes:

Name Type Description
carrier str

Network operator name (e.g. "Dialog", "Mobitel", "SLT"). Determined by the three-digit local prefix.

line_type LineType

Either "mobile" or "fixed".

local str

The 10-digit local form ("0XXXXXXXXX"). The international form is on the result as normalized.

Example

result = validate_phone("+94712345678") result.decoded PhoneDecoded(carrier='Mobitel', line_type='mobile', local='0712345678')

PhoneResult dataclass

PhoneResult(is_valid: bool, value: str, normalized: str | None = None, errors: list[ValidationError] = list(), data: dict[str, Any] = dict())

Bases: ValidationResult

Validation result returned by :func:~helakit.phone.validate_phone.

Adds typed property accessors for every field the phone validator extracts. The underlying data dict is still populated for backwards compatibility, so every access style works::

result.carrier        # typed property — preferred
result["carrier"]     # dict-style access
result.data["carrier"]  # original form

Properties return None on invalid results so attribute access never raises — guard with if result: or if result.is_valid:.

carrier property

carrier: str | None

Network operator name, e.g. "Dialog". None if invalid.

decoded property

decoded: PhoneDecoded | None

Full :class:PhoneDecoded payload. None if invalid.

line_type property

line_type: LineType | None

"mobile" or "fixed". None if invalid.

local property

local: str | None

10-digit local form "0XXXXXXXXX". None if invalid.

PostalDecoded dataclass

PostalDecoded(district: str, province: str, post_office: str | None = None)

Structured metadata about a recognised Sri Lankan postal code.

Returned in PostalResult.data["decoded"] and accessible as PostalResult.decoded.

Attributes:

Name Type Description
district str

The district the code belongs to (e.g. "Colombo").

province str

The province the district is in (e.g. "Western").

post_office str | None

The specific post office name, when known.

PostalResult dataclass

PostalResult(is_valid: bool, value: str, normalized: str | None = None, errors: list[ValidationError] = list(), data: dict[str, Any] = dict())

Bases: ValidationResult

Validation result returned by :func:~helakit.postal.validate_postal.

Typed property accessors mirror the keys placed in data by the validator. Properties return None on invalid results.

Planned API

Postal-code validation is not implemented yet; this class is wired up in advance so the planned shape can be documented and imported. Calling :func:~helakit.postal.validate_postal currently raises NotImplementedError.

decoded property

decoded: PostalDecoded | None

Full :class:PostalDecoded payload. None if invalid.

district property

district: str | None

District name. None if invalid.

post_office property

post_office: str | None

Post office name, when known. None otherwise.

province property

province: str | None

Province name. None if invalid.

ValidationError dataclass

ValidationError(code: str, message: str, field: str | None = None)

A single validation failure.

Attributes:

Name Type Description
code str

A short machine-readable identifier (e.g. "nic.bad_checksum"). Codes are namespaced by domain and are stable across releases — check them in tests instead of matching on message.

message str

A human-readable description of what went wrong. Safe to show to end users.

field str | None

The specific field within the input that failed, if applicable. None when the failure is not tied to a single field.

Example

err = ValidationError( ... code="phone.invalid_length", ... message="Sri Lankan phone numbers must be 10 digits.", ... field="value", ... ) err.code 'phone.invalid_length'

ValidationResult dataclass

ValidationResult(is_valid: bool, value: str, normalized: str | None = None, errors: list[ValidationError] = list(), data: dict[str, Any] = dict())

The outcome of validating a single value.

A ValidationResult is immutable and truthy when valid, so it drops naturally into if statements::

if validate_phone("0712345678"):
    ...

Domain-specific subclasses (:class:~helakit.phone.PhoneResult, :class:~helakit.nic.NicResult, :class:~helakit.postal.PostalResult) add typed properties on top of this shape so you can write result.carrier instead of result.data["carrier"].

Attributes:

Name Type Description
is_valid bool

True if the input passed every check.

value str

The original input string, unmodified. Useful for echoing user input back in error messages.

normalized str | None

The canonical representation of value when valid (for example "+94712345678" for a phone number). None on invalid results.

errors list[ValidationError]

Every error encountered. Empty list when is_valid is True. Errors are reported in the order they were detected; validators short-circuit on the first hard failure.

data dict[str, Any]

Structured fields extracted during validation (e.g. NIC date of birth, phone carrier). Empty when no fields could be extracted. Subclasses surface these as typed properties.

Example

from helakit import validate_phone result = validate_phone("0712345678") result.is_valid True result.normalized '+94712345678' result["carrier"] # dict-style access 'Mobitel' result.carrier # attribute access (PhoneResult) 'Mobitel'

__contains__

__contains__(key: object) -> bool

Return True if key is a field that was extracted.

__getitem__

__getitem__(key: str) -> Any

Look up an extracted field by name.

Mirrors the dict-style access pattern people expect from libraries like pandas. Raises :class:KeyError if the field was not extracted (e.g. on invalid input).

Parameters:

Name Type Description Default
key str

The name of the field to read from data.

required

Returns:

Type Description
Any

The value stored in data[key].

Raises:

Type Description
KeyError

If key is not present in data.

Example

result = validate_phone("0712345678") result["line_type"] 'mobile'

__iter__

__iter__() -> Iterator[str]

Iterate over extracted field names, like a dict.

get

get(key: str, default: Any = None) -> Any

Return data[key] if present, otherwise default.

The non-raising counterpart to result[key] — handy when you only want a field if validation actually extracted it.

Parameters:

Name Type Description Default
key str

The field name to look up.

required
default Any

Returned when key is not in data.

None
Example

result = validate_phone("0712345678") result.get("carrier") 'Mobitel' result.get("missing", "n/a") 'n/a'

convert_nic

convert_nic(value: str, *, century: int = ...) -> str
convert_nic(value: list[str] | tuple[str, ...], *, century: int = ..., errors: ErrorMode = ...) -> list[str | None]
convert_nic(value: Any, *, century: int = ..., nic_col: str | None = ..., errors: ErrorMode = ..., error_col: str | None = ...) -> Any
convert_nic(value: Any, *, century: int = DEFAULT_OLD_NIC_CENTURY, nic_col: str | None = None, errors: ErrorMode = 'raise', error_col: str | None = None) -> Any

Convert an old-format NIC to the new 12-digit format.

Parameters:

Name Type Description Default
value Any

Either a single NIC string, a list of strings, a pandas or polars Series, or a pandas/polars DataFrame. For Series input the function returns a Series of the same length, preserving index/name. For DataFrame input nic_col specifies which column to convert; the function returns a copy of the frame with a new nic_converted column added.

required
century int

Century to assume for two-digit years on old NICs.

DEFAULT_OLD_NIC_CENTURY
nic_col str | None

Required for tabular input.

None
errors ErrorMode

How to handle individual values that cannot be converted. "raise" (default) propagates :class:NICFormatError on the first bad value — same as pandas strict mode. "coerce" replaces bad values with None so a single malformed row no longer fails the whole batch. "ignore" leaves the original input through untouched. Scalar input always raises regardless of this setting.

'raise'
error_col str | None

Column name for per-row error messages (DataFrame input only). When supplied, the returned frame gets an extra column with the failure message or None for rows that converted cleanly. Implies errors="coerce" if errors is left at the default of "raise".

None

Returns:

Type Description
Any

The same shape as the input — string for string, list for list,

Any

DataFrame for DataFrame.

Raises:

Type Description
NICFormatError

If a value cannot be parsed (e.g. wrong length, non-numeric content) and errors="raise". New-format input passes through unchanged.

InvalidInputError

For unsupported input types or invalid errors values.

is_valid_nic

is_valid_nic(value: str, *, format: FormatHint = 'any') -> bool

Return True if value is a structurally valid NIC.

Scalar-only — for batch checks use :func:validate_nic and inspect each ValidationResult.

Raises:

Type Description
InvalidInputError

If value is not a string.

is_valid_phone

is_valid_phone(value: str) -> bool

Return True if value is a valid Sri Lankan phone number.

Boolean shorthand for :func:validate_phone. Use this when you only need a yes/no answer; use :func:validate_phone when you also need carrier metadata or the normalized form.

Parameters:

Name Type Description Default
value str

The phone number to check, in local or international form.

required

Returns:

Type Description
bool

True when the number is valid, False otherwise.

Raises:

Type Description
InvalidInputError

If value is not a string.

Example

is_valid_phone("0712345678") True is_valid_phone("0001234567") False

is_valid_postal

is_valid_postal(value: str) -> bool

Return True if value is a valid Sri Lankan postal code.

Raises:

Type Description
NotImplementedError

Postal-code validation has not been implemented yet.

validate_nic

validate_nic(data: str, *, format: FormatHint = ..., century: int = ...) -> NicResult
validate_nic(data: list[str] | list[dict[str, Any]] | tuple[str, ...], *, format: FormatHint = ..., nic_col: str | None = ..., dob_col: str | None = ..., gender_col: str | None = ..., century: int = ..., errors: BatchErrorMode = ...) -> NICBatchResult
validate_nic(data: Any, *, format: FormatHint = ..., nic_col: str | None = ..., dob_col: str | None = ..., gender_col: str | None = ..., century: int = ..., errors: BatchErrorMode = ...) -> NICBatchResult
validate_nic(data: Any, *, format: FormatHint = 'any', nic_col: str | None = None, dob_col: str | None = None, gender_col: str | None = None, century: int = DEFAULT_OLD_NIC_CENTURY, errors: BatchErrorMode = 'raise') -> NicResult | NICBatchResult

Validate one or many Sri Lankan NIC numbers.

Parameters:

Name Type Description Default
data Any

A single NIC string, a list[str], a list[dict], a pandas DataFrame, or a polars DataFrame.

required
format FormatHint

Restrict to "old" or "new" only. "any" (default) accepts both.

'any'
nic_col str | None

Column name holding NICs (required for tabular input).

None
dob_col str | None

Column name holding dates of birth. When supplied each row is cross-checked and the per-row result records whether the decoded DOB matched.

None
gender_col str | None

Column name holding gender (M/F/Male/ Female, case-insensitive).

None
century int

Century to assume for two-digit years on old NICs. Defaults to 1900.

DEFAULT_OLD_NIC_CENTURY
errors BatchErrorMode

Batch-only. "raise" (default) propagates :class:InvalidInputError if any cross-check dob or gender value is unparseable, matching strict pandas semantics. "coerce" records the failure as a per-row error in :attr:ValidationResult.errors (and the nic_errors column on DataFrame output) so a single malformed row no longer aborts the whole batch.

'raise'

Returns:

Name Type Description
A NicResult | NICBatchResult

class:NicResult for scalar input, or a

NicResult | NICBatchResult

class:NICBatchResult for any iterable input.

Raises:

Type Description
InvalidInputError

For unsupported input types, an invalid errors value, or unparseable gender / DOB values when errors="raise".

validate_phone

validate_phone(value: str) -> PhoneResult

Validate a Sri Lankan phone number.

The validator accepts numbers in three input forms — local ("0712345678"), international with leading + ("+94712345678"), and international without the + ("94712345678") — and tolerates spaces, hyphens, and parentheses inside the number (they are stripped before validation).

Parameters:

Name Type Description Default
value str

The phone number as a string. Must contain ASCII digits only (optionally prefixed with +); Unicode digits and other characters cause a phone.invalid_characters error.

required

Returns:

Name Type Description
A PhoneResult

class:PhoneResult. When valid, normalized holds the

PhoneResult

canonical "+94XXXXXXXXX" form and the following typed

PhoneResult

properties are populated:

PhoneResult
  • carrier — network operator name (e.g. "Mobitel").
PhoneResult
  • line_type"mobile" or "fixed".
PhoneResult
  • local — the "0XXXXXXXXX" local representation.
PhoneResult
  • decoded — a :class:PhoneDecoded bundling the above.
PhoneResult

On invalid input, is_valid is False, normalized is

PhoneResult

None, and errors contains one or more

PhoneResult

class:~helakit._core.result.ValidationError\ s with codes from

PhoneResult

the table below.

Raises:

Type Description
InvalidInputError

If value is not a string. This is a programmer error (not a validation failure) — passing None or an int is treated as misuse, not as bad data.

Error codes
  • phone.invalid_characters — input contains characters other than ASCII digits and an optional leading +.
  • phone.missing_prefix — input has no recognisable leading 0, +94, or 94.
  • phone.invalid_length — input has the right shape but the wrong number of digits (must be 10 in local form).
  • phone.unknown_prefix — the three-digit local prefix is not a Sri Lankan mobile or fixed-line prefix.
Example

Basic validation::

>>> result = validate_phone("0712345678")
>>> result.is_valid
True
>>> result.normalized
'+94712345678'
>>> result.carrier
'Mobitel'
>>> result.line_type
'mobile'

International form::

>>> validate_phone("+94772345678").carrier
'Dialog'

Formatting tolerance::

>>> validate_phone("071 234 5678").is_valid
True
>>> validate_phone("(071) 234-5678").is_valid
True

Handling invalid input::

>>> result = validate_phone("0001234567")
>>> result.is_valid
False
>>> result.errors[0].code
'phone.unknown_prefix'

validate_postal

validate_postal(value: str) -> PostalResult

Validate a Sri Lankan postal code.

Parameters:

Name Type Description Default
value str

A five-digit postal code.

required

Returns:

Name Type Description
A PostalResult

class:~helakit.postal.PostalResult with the matching

PostalResult

district and province populated when valid. Accessible as

PostalResult

result.district / result.province or via

PostalResult

result["district"].

Raises:

Type Description
NotImplementedError

Postal-code validation has not been implemented yet.