helakit
helakit
Helakit — a toolkit for validating and working with Sri Lankan data.
HelakitError
Bases: Exception
Base class for every exception raised by helakit.
InvalidInputError
Bases: HelakitError
Raised when an input is the wrong type or otherwise unusable.
This signals a programmer error, not a validation failure. A malformed
NIC string returns a :class:ValidationResult with is_valid=False;
passing None instead of a string raises this.
NICBatchResult
dataclass
NICBatchResult(results: list[NicResult], duplicates: dict[str, list[int]] = dict(), summary: NICSummary = (lambda: NICSummary(0, 0, 0, 0, 0, 0, 0))(), df: Any | None = None)
The outcome of validating a list / dataframe of NICs.
Attributes:
| Name | Type | Description |
|---|---|---|
results |
list[NicResult]
|
One :class: |
duplicates |
dict[str, list[int]]
|
Mapping from canonical NIC (uppercased, V/X stripped) to the row indices it appeared at. Only entries with two or more indices are included. |
summary |
NICSummary
|
Roll-up counts. |
df |
Any | None
|
When the input was a pandas or polars DataFrame, this is a
copy of that frame with helakit's per-row columns appended.
|
NICDecoded
dataclass
NICDecoded(format: Literal['old', 'new'], dob: date, gender: Literal['male', 'female'], age: int, year: int, day_code: int, serial: int, check_digit: int, voting_eligible: bool | None)
Structured fields extracted from a valid NIC.
Attributes:
| Name | Type | Description |
|---|---|---|
format |
Literal['old', 'new']
|
Either |
dob |
date
|
Decoded date of birth. |
gender |
Literal['male', 'female']
|
Either |
age |
int
|
Age in completed years at the time of validation. |
year |
int
|
Full birth year. |
day_code |
int
|
Day-of-year encoding with the female 500 offset removed. |
serial |
int
|
Serial number assigned on the registration day. |
check_digit |
int
|
Final check digit (not currently verified — see
:mod: |
voting_eligible |
bool | None
|
|
NICError
Bases: HelakitError
Base class for NIC-related errors raised by helakit.
NICFormatError
Bases: NICError
Raised when an input cannot be parsed as either NIC format.
Used by :func:helakit.nic.convert_nic because conversion has no
sensible ValidationResult to return — failure must propagate as
an exception.
NICSummary
dataclass
NICSummary(total: int, valid: int, invalid: int, duplicate_groups: int, duplicate_rows: int, dob_mismatches: int, gender_mismatches: int)
Aggregate counts for a batch validation run.
Attributes:
| Name | Type | Description |
|---|---|---|
total |
int
|
Number of input rows processed. |
valid |
int
|
Rows whose NIC passed structural validation. |
invalid |
int
|
Rows that failed structural validation. |
duplicate_groups |
int
|
Number of distinct NICs that appear in more than one row (after canonicalisation that strips V/X suffixes). |
duplicate_rows |
int
|
Total rows participating in any duplicate group. |
dob_mismatches |
int
|
Rows where the supplied DOB differed from the NIC-decoded DOB (only counted when both were available). |
gender_mismatches |
int
|
Same idea for gender. |
NicResult
dataclass
NicResult(is_valid: bool, value: str, normalized: str | None = None, errors: list[ValidationError] = list(), data: dict[str, Any] = dict())
Bases: ValidationResult
Validation result returned by :func:~helakit.nic.validate_nic.
Adds typed property accessors for every field the NIC validator
extracts, including pass-through accessors that reach into
:class:NICDecoded so the most commonly-used fields are one
attribute away::
result.decoded.dob # works
result.dob # also works — same value
Properties return None on invalid results so attribute access
never raises — guard with if result: before reading.
age
property
age: int | None
Age in completed years at validation time. None if invalid.
decoded
property
decoded: NICDecoded | None
Full :class:NICDecoded payload. None if invalid.
dob
property
dob: date | None
Decoded date of birth. None if invalid.
dob_match
property
dob_match: bool | None
True / False if a DOB was cross-checked; None otherwise.
format
property
format: Literal['old', 'new'] | None
"old" or "new". None if invalid.
gender
property
gender: Literal['male', 'female'] | None
"male" or "female". None if invalid.
gender_match
property
gender_match: bool | None
True / False if a gender was cross-checked; None otherwise.
mismatch_detail
property
mismatch_detail: str | None
Human-readable diff of cross-check vs decoded. None when
no cross-check ran or everything matched.
mismatch_reasons
property
mismatch_reasons: list[str] | None
Which cross-check fields disagreed with the NIC. None when
no cross-check ran.
serial
property
serial: int | None
Serial number assigned on the registration day. None if invalid.
voting_eligible
property
voting_eligible: bool | None
True / False for old NICs; None for new NICs or
invalid results.
year
property
year: int | None
Full birth year. None if invalid.
PhoneDecoded
dataclass
PhoneDecoded(carrier: str, line_type: LineType, local: str)
Structured metadata about a recognised Sri Lankan phone number.
Returned in PhoneResult.data["decoded"] and also accessible as
PhoneResult.decoded. Bundles the three pieces of information that
can be derived from a number's prefix.
Attributes:
| Name | Type | Description |
|---|---|---|
carrier |
str
|
Network operator name (e.g. |
line_type |
LineType
|
Either |
local |
str
|
The 10-digit local form ( |
Example
result = validate_phone("+94712345678") result.decoded PhoneDecoded(carrier='Mobitel', line_type='mobile', local='0712345678')
PhoneResult
dataclass
PhoneResult(is_valid: bool, value: str, normalized: str | None = None, errors: list[ValidationError] = list(), data: dict[str, Any] = dict())
Bases: ValidationResult
Validation result returned by :func:~helakit.phone.validate_phone.
Adds typed property accessors for every field the phone validator
extracts. The underlying data dict is still populated for
backwards compatibility, so every access style works::
result.carrier # typed property — preferred
result["carrier"] # dict-style access
result.data["carrier"] # original form
Properties return None on invalid results so attribute access
never raises — guard with if result: or if result.is_valid:.
carrier
property
carrier: str | None
Network operator name, e.g. "Dialog". None if invalid.
decoded
property
decoded: PhoneDecoded | None
Full :class:PhoneDecoded payload. None if invalid.
line_type
property
line_type: LineType | None
"mobile" or "fixed". None if invalid.
local
property
local: str | None
10-digit local form "0XXXXXXXXX". None if invalid.
PostalDecoded
dataclass
PostalDecoded(district: str, province: str, post_office: str | None = None)
Structured metadata about a recognised Sri Lankan postal code.
Returned in PostalResult.data["decoded"] and accessible as
PostalResult.decoded.
Attributes:
| Name | Type | Description |
|---|---|---|
district |
str
|
The district the code belongs to (e.g. |
province |
str
|
The province the district is in (e.g. |
post_office |
str | None
|
The specific post office name, when known. |
PostalResult
dataclass
PostalResult(is_valid: bool, value: str, normalized: str | None = None, errors: list[ValidationError] = list(), data: dict[str, Any] = dict())
Bases: ValidationResult
Validation result returned by
:func:~helakit.postal.validate_postal.
Typed property accessors mirror the keys placed in data by the
validator. Properties return None on invalid results.
Planned API
Postal-code validation is not implemented yet; this class is
wired up in advance so the planned shape can be documented and
imported. Calling :func:~helakit.postal.validate_postal
currently raises NotImplementedError.
decoded
property
decoded: PostalDecoded | None
Full :class:PostalDecoded payload. None if invalid.
district
property
district: str | None
District name. None if invalid.
post_office
property
post_office: str | None
Post office name, when known. None otherwise.
province
property
province: str | None
Province name. None if invalid.
ValidationError
dataclass
ValidationError(code: str, message: str, field: str | None = None)
A single validation failure.
Attributes:
| Name | Type | Description |
|---|---|---|
code |
str
|
A short machine-readable identifier (e.g. |
message |
str
|
A human-readable description of what went wrong. Safe to show to end users. |
field |
str | None
|
The specific field within the input that failed, if applicable.
|
Example
err = ValidationError( ... code="phone.invalid_length", ... message="Sri Lankan phone numbers must be 10 digits.", ... field="value", ... ) err.code 'phone.invalid_length'
ValidationResult
dataclass
ValidationResult(is_valid: bool, value: str, normalized: str | None = None, errors: list[ValidationError] = list(), data: dict[str, Any] = dict())
The outcome of validating a single value.
A ValidationResult is immutable and truthy when valid, so it
drops naturally into if statements::
if validate_phone("0712345678"):
...
Domain-specific subclasses (:class:~helakit.phone.PhoneResult,
:class:~helakit.nic.NicResult, :class:~helakit.postal.PostalResult)
add typed properties on top of this shape so you can write
result.carrier instead of result.data["carrier"].
Attributes:
| Name | Type | Description |
|---|---|---|
is_valid |
bool
|
|
value |
str
|
The original input string, unmodified. Useful for echoing user input back in error messages. |
normalized |
str | None
|
The canonical representation of |
errors |
list[ValidationError]
|
Every error encountered. Empty list when |
data |
dict[str, Any]
|
Structured fields extracted during validation (e.g. NIC date of birth, phone carrier). Empty when no fields could be extracted. Subclasses surface these as typed properties. |
Example
from helakit import validate_phone result = validate_phone("0712345678") result.is_valid True result.normalized '+94712345678' result["carrier"] # dict-style access 'Mobitel' result.carrier # attribute access (PhoneResult) 'Mobitel'
__contains__
__contains__(key: object) -> bool
Return True if key is a field that was extracted.
__getitem__
__getitem__(key: str) -> Any
Look up an extracted field by name.
Mirrors the dict-style access pattern people expect from libraries
like pandas. Raises :class:KeyError if the field was not
extracted (e.g. on invalid input).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
The name of the field to read from |
required |
Returns:
| Type | Description |
|---|---|
Any
|
The value stored in |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
Example
result = validate_phone("0712345678") result["line_type"] 'mobile'
__iter__
__iter__() -> Iterator[str]
Iterate over extracted field names, like a dict.
get
get(key: str, default: Any = None) -> Any
Return data[key] if present, otherwise default.
The non-raising counterpart to result[key] — handy when you
only want a field if validation actually extracted it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
The field name to look up. |
required |
default
|
Any
|
Returned when |
None
|
Example
result = validate_phone("0712345678") result.get("carrier") 'Mobitel' result.get("missing", "n/a") 'n/a'
convert_nic
convert_nic(value: str, *, century: int = ...) -> str
convert_nic(value: list[str] | tuple[str, ...], *, century: int = ..., errors: ErrorMode = ...) -> list[str | None]
convert_nic(value: Any, *, century: int = ..., nic_col: str | None = ..., errors: ErrorMode = ..., error_col: str | None = ...) -> Any
convert_nic(value: Any, *, century: int = DEFAULT_OLD_NIC_CENTURY, nic_col: str | None = None, errors: ErrorMode = 'raise', error_col: str | None = None) -> Any
Convert an old-format NIC to the new 12-digit format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
Any
|
Either a single NIC string, a list of strings, a pandas
or polars Series, or a pandas/polars DataFrame. For Series
input the function returns a Series of the same length,
preserving index/name. For DataFrame input |
required |
century
|
int
|
Century to assume for two-digit years on old NICs. |
DEFAULT_OLD_NIC_CENTURY
|
nic_col
|
str | None
|
Required for tabular input. |
None
|
errors
|
ErrorMode
|
How to handle individual values that cannot be converted.
|
'raise'
|
error_col
|
str | None
|
Column name for per-row error messages (DataFrame
input only). When supplied, the returned frame gets an extra
column with the failure message or |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
The same shape as the input — string for string, list for list, |
Any
|
DataFrame for DataFrame. |
Raises:
| Type | Description |
|---|---|
NICFormatError
|
If a value cannot be parsed (e.g. wrong length,
non-numeric content) and |
InvalidInputError
|
For unsupported input types or invalid
|
is_valid_nic
is_valid_nic(value: str, *, format: FormatHint = 'any') -> bool
Return True if value is a structurally valid NIC.
Scalar-only — for batch checks use :func:validate_nic and inspect
each ValidationResult.
Raises:
| Type | Description |
|---|---|
InvalidInputError
|
If |
is_valid_phone
is_valid_phone(value: str) -> bool
Return True if value is a valid Sri Lankan phone number.
Boolean shorthand for :func:validate_phone. Use this when you only
need a yes/no answer; use :func:validate_phone when you also need
carrier metadata or the normalized form.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str
|
The phone number to check, in local or international form. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
|
Raises:
| Type | Description |
|---|---|
InvalidInputError
|
If |
Example
is_valid_phone("0712345678") True is_valid_phone("0001234567") False
is_valid_postal
is_valid_postal(value: str) -> bool
Return True if value is a valid Sri Lankan postal code.
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
Postal-code validation has not been implemented yet. |
validate_nic
validate_nic(data: str, *, format: FormatHint = ..., century: int = ...) -> NicResult
validate_nic(data: list[str] | list[dict[str, Any]] | tuple[str, ...], *, format: FormatHint = ..., nic_col: str | None = ..., dob_col: str | None = ..., gender_col: str | None = ..., century: int = ..., errors: BatchErrorMode = ...) -> NICBatchResult
validate_nic(data: Any, *, format: FormatHint = ..., nic_col: str | None = ..., dob_col: str | None = ..., gender_col: str | None = ..., century: int = ..., errors: BatchErrorMode = ...) -> NICBatchResult
validate_nic(data: Any, *, format: FormatHint = 'any', nic_col: str | None = None, dob_col: str | None = None, gender_col: str | None = None, century: int = DEFAULT_OLD_NIC_CENTURY, errors: BatchErrorMode = 'raise') -> NicResult | NICBatchResult
Validate one or many Sri Lankan NIC numbers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Any
|
A single NIC string, a |
required |
format
|
FormatHint
|
Restrict to |
'any'
|
nic_col
|
str | None
|
Column name holding NICs (required for tabular input). |
None
|
dob_col
|
str | None
|
Column name holding dates of birth. When supplied each row is cross-checked and the per-row result records whether the decoded DOB matched. |
None
|
gender_col
|
str | None
|
Column name holding gender ( |
None
|
century
|
int
|
Century to assume for two-digit years on old NICs.
Defaults to |
DEFAULT_OLD_NIC_CENTURY
|
errors
|
BatchErrorMode
|
Batch-only. |
'raise'
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
NicResult | NICBatchResult
|
class: |
NicResult | NICBatchResult
|
class: |
Raises:
| Type | Description |
|---|---|
InvalidInputError
|
For unsupported input types, an invalid
|
validate_phone
validate_phone(value: str) -> PhoneResult
Validate a Sri Lankan phone number.
The validator accepts numbers in three input forms — local
("0712345678"), international with leading +
("+94712345678"), and international without the +
("94712345678") — and tolerates spaces, hyphens, and parentheses
inside the number (they are stripped before validation).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str
|
The phone number as a string. Must contain ASCII digits
only (optionally prefixed with |
required |
Returns:
| Name | Type | Description |
|---|---|---|
A |
PhoneResult
|
class: |
PhoneResult
|
canonical |
|
PhoneResult
|
properties are populated: |
|
PhoneResult
|
|
|
PhoneResult
|
|
|
PhoneResult
|
|
|
PhoneResult
|
|
|
PhoneResult
|
On invalid input, |
|
PhoneResult
|
|
|
PhoneResult
|
class: |
|
PhoneResult
|
the table below. |
Raises:
| Type | Description |
|---|---|
InvalidInputError
|
If |
Error codes
phone.invalid_characters— input contains characters other than ASCII digits and an optional leading+.phone.missing_prefix— input has no recognisable leading0,+94, or94.phone.invalid_length— input has the right shape but the wrong number of digits (must be 10 in local form).phone.unknown_prefix— the three-digit local prefix is not a Sri Lankan mobile or fixed-line prefix.
Example
Basic validation::
>>> result = validate_phone("0712345678")
>>> result.is_valid
True
>>> result.normalized
'+94712345678'
>>> result.carrier
'Mobitel'
>>> result.line_type
'mobile'
International form::
>>> validate_phone("+94772345678").carrier
'Dialog'
Formatting tolerance::
>>> validate_phone("071 234 5678").is_valid
True
>>> validate_phone("(071) 234-5678").is_valid
True
Handling invalid input::
>>> result = validate_phone("0001234567")
>>> result.is_valid
False
>>> result.errors[0].code
'phone.unknown_prefix'
validate_postal
validate_postal(value: str) -> PostalResult
Validate a Sri Lankan postal code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str
|
A five-digit postal code. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
A |
PostalResult
|
class: |
PostalResult
|
district and province populated when valid. Accessible as |
|
PostalResult
|
|
|
PostalResult
|
|
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
Postal-code validation has not been implemented yet. |