Skip to content

helakit

helakit

Helakit — a toolkit for validating and working with Sri Lankan data.

HelakitError

Bases: Exception

Base class for every exception raised by helakit.

InvalidInputError

Bases: HelakitError

Raised when an input is the wrong type or otherwise unusable.

This signals a programmer error, not a validation failure. A malformed NIC string returns a :class:ValidationResult with is_valid=False; passing None instead of a string raises this.

NICBatchResult dataclass

NICBatchResult(results: list[ValidationResult], duplicates: dict[str, list[int]] = dict(), summary: NICSummary = (lambda: NICSummary(0, 0, 0, 0, 0, 0, 0))(), df: Any | None = None)

The outcome of validating a list / dataframe of NICs.

Attributes:

Name Type Description
results list[ValidationResult]

One :class:ValidationResult per input row, in the same order as the input.

duplicates dict[str, list[int]]

Mapping from canonical NIC (uppercased, V/X stripped) to the row indices it appeared at. Only entries with two or more indices are included.

summary NICSummary

Roll-up counts.

df Any | None

When the input was a pandas or polars DataFrame, this is a copy of that frame with helakit's per-row columns appended. None when the input was a list.

NICDecoded dataclass

NICDecoded(format: Literal['old', 'new'], dob: date, gender: Literal['male', 'female'], age: int, year: int, day_code: int, serial: int, check_digit: int, voting_eligible: bool | None)

Structured fields extracted from a valid NIC.

Attributes:

Name Type Description
format Literal['old', 'new']

Either "old" (9 digits + V/X) or "new" (12 digits).

dob date

Decoded date of birth.

gender Literal['male', 'female']

Either "male" or "female", derived from the day-of-year encoding.

age int

Age in completed years at the time of validation.

year int

Full birth year.

day_code int

Day-of-year encoding with the female 500 offset removed.

serial int

Serial number assigned on the registration day.

check_digit int

Final check digit (not currently verified — see :mod:helakit.nic._parse).

voting_eligible bool | None

True if the old NIC ends in V, False if X, None for new-format NICs.

NICError

Bases: HelakitError

Base class for NIC-related errors raised by helakit.

NICFormatError

Bases: NICError

Raised when an input cannot be parsed as either NIC format.

Used by :func:helakit.nic.convert_nic because conversion has no sensible ValidationResult to return — failure must propagate as an exception.

NICSummary dataclass

NICSummary(total: int, valid: int, invalid: int, duplicate_groups: int, duplicate_rows: int, dob_mismatches: int, gender_mismatches: int)

Aggregate counts for a batch validation run.

Attributes:

Name Type Description
total int

Number of input rows processed.

valid int

Rows whose NIC passed structural validation.

invalid int

Rows that failed structural validation.

duplicate_groups int

Number of distinct NICs that appear in more than one row (after canonicalisation that strips V/X suffixes).

duplicate_rows int

Total rows participating in any duplicate group.

dob_mismatches int

Rows where the supplied DOB differed from the NIC-decoded DOB (only counted when both were available).

gender_mismatches int

Same idea for gender.

ValidationError dataclass

ValidationError(code: str, message: str, field: str | None = None)

A single validation failure.

Attributes:

Name Type Description
code str

A short machine-readable identifier (e.g. "nic.bad_checksum").

message str

A human-readable description of what went wrong.

field str | None

The specific field within the input that failed, if applicable.

ValidationResult dataclass

ValidationResult(is_valid: bool, value: str, normalized: str | None = None, errors: list[ValidationError] = list(), data: dict[str, Any] = dict())

The outcome of validating a single value.

Attributes:

Name Type Description
is_valid bool

True if the input passed every check.

value str

The original input string, unmodified.

normalized str | None

The canonical representation of value when valid.

errors list[ValidationError]

Every error encountered. Empty when is_valid is True.

data dict[str, Any]

Structured fields extracted during validation (e.g. NIC date of birth, gender). Empty when no fields could be extracted.

convert_nic

convert_nic(value: str, *, century: int = ...) -> str
convert_nic(value: list[str] | tuple[str, ...], *, century: int = ..., errors: ErrorMode = ...) -> list[str | None]
convert_nic(value: Any, *, century: int = ..., nic_col: str | None = ..., errors: ErrorMode = ..., error_col: str | None = ...) -> Any
convert_nic(value: Any, *, century: int = DEFAULT_OLD_NIC_CENTURY, nic_col: str | None = None, errors: ErrorMode = 'raise', error_col: str | None = None) -> Any

Convert an old-format NIC to the new 12-digit format.

Parameters:

Name Type Description Default
value Any

Either a single NIC string, a list of strings, a pandas or polars Series, or a pandas/polars DataFrame. For Series input the function returns a Series of the same length, preserving index/name. For DataFrame input nic_col specifies which column to convert; the function returns a copy of the frame with a new nic_converted column added.

required
century int

Century to assume for two-digit years on old NICs.

DEFAULT_OLD_NIC_CENTURY
nic_col str | None

Required for tabular input.

None
errors ErrorMode

How to handle individual values that cannot be converted. "raise" (default) propagates :class:NICFormatError on the first bad value — same as pandas strict mode. "coerce" replaces bad values with None so a single malformed row no longer fails the whole batch. "ignore" leaves the original input through untouched. Scalar input always raises regardless of this setting.

'raise'
error_col str | None

Column name for per-row error messages (DataFrame input only). When supplied, the returned frame gets an extra column with the failure message or None for rows that converted cleanly. Implies errors="coerce" if errors is left at the default of "raise".

None

Returns:

Type Description
Any

The same shape as the input — string for string, list for list,

Any

DataFrame for DataFrame.

Raises:

Type Description
NICFormatError

If a value cannot be parsed (e.g. wrong length, non-numeric content) and errors="raise". New-format input passes through unchanged.

InvalidInputError

For unsupported input types or invalid errors values.

is_valid_nic

is_valid_nic(value: str, *, format: FormatHint = 'any') -> bool

Return True if value is a structurally valid NIC.

Scalar-only — for batch checks use :func:validate_nic and inspect each ValidationResult.

Raises:

Type Description
InvalidInputError

If value is not a string.

is_valid_phone

is_valid_phone(value: str) -> bool

Return True if value is a valid Sri Lankan phone number.

Parameters:

Name Type Description Default
value str

The phone number to check (local or international form).

required

Returns:

Type Description
bool

True when valid, False otherwise.

Raises:

Type Description
InvalidInputError

If value is not a string.

is_valid_postal

is_valid_postal(value: str) -> bool

Return True if value is a valid Sri Lankan postal code.

Raises:

Type Description
NotImplementedError

Postal-code validation has not been implemented yet.

validate_nic

validate_nic(data: str, *, format: FormatHint = ..., century: int = ...) -> ValidationResult
validate_nic(data: list[str] | list[dict[str, Any]] | tuple[str, ...], *, format: FormatHint = ..., nic_col: str | None = ..., dob_col: str | None = ..., gender_col: str | None = ..., century: int = ..., errors: BatchErrorMode = ...) -> NICBatchResult
validate_nic(data: Any, *, format: FormatHint = ..., nic_col: str | None = ..., dob_col: str | None = ..., gender_col: str | None = ..., century: int = ..., errors: BatchErrorMode = ...) -> NICBatchResult
validate_nic(data: Any, *, format: FormatHint = 'any', nic_col: str | None = None, dob_col: str | None = None, gender_col: str | None = None, century: int = DEFAULT_OLD_NIC_CENTURY, errors: BatchErrorMode = 'raise') -> ValidationResult | NICBatchResult

Validate one or many Sri Lankan NIC numbers.

Parameters:

Name Type Description Default
data Any

A single NIC string, a list[str], a list[dict], a pandas DataFrame, or a polars DataFrame.

required
format FormatHint

Restrict to "old" or "new" only. "any" (default) accepts both.

'any'
nic_col str | None

Column name holding NICs (required for tabular input).

None
dob_col str | None

Column name holding dates of birth. When supplied each row is cross-checked and the per-row result records whether the decoded DOB matched.

None
gender_col str | None

Column name holding gender (M/F/Male/ Female, case-insensitive).

None
century int

Century to assume for two-digit years on old NICs. Defaults to 1900.

DEFAULT_OLD_NIC_CENTURY
errors BatchErrorMode

Batch-only. "raise" (default) propagates :class:InvalidInputError if any cross-check dob or gender value is unparseable, matching strict pandas semantics. "coerce" records the failure as a per-row error in :attr:ValidationResult.errors (and the nic_errors column on DataFrame output) so a single malformed row no longer aborts the whole batch.

'raise'

Returns:

Name Type Description
A ValidationResult | NICBatchResult

class:ValidationResult for scalar input, or a

ValidationResult | NICBatchResult

class:NICBatchResult for any iterable input.

Raises:

Type Description
InvalidInputError

For unsupported input types, an invalid errors value, or unparseable gender / DOB values when errors="raise".

validate_phone

validate_phone(value: str) -> ValidationResult

Validate a Sri Lankan phone number.

Parameters:

Name Type Description Default
value str

The phone number in local ("0712345678") or international ("+94712345678") form. Spaces and hyphens are stripped before validation.

required

Returns:

Name Type Description
A ValidationResult

class:~helakit._core.result.ValidationResult. When valid,

ValidationResult

normalized holds the canonical +94XXXXXXXXX form and

ValidationResult

data contains:

ValidationResult
  • "decoded" - a :class:PhoneDecoded with carrier, line_type and local representation.
ValidationResult
  • "carrier" - the network operator name (e.g. "Dialog").
ValidationResult
  • "line_type" - "mobile" or "fixed".
ValidationResult
  • "local" - the local 0XXXXXXXXX representation.

Raises:

Type Description
InvalidInputError

If value is not a string.

validate_postal

validate_postal(value: str) -> ValidationResult

Validate a Sri Lankan postal code.

Parameters:

Name Type Description Default
value str

A five-digit postal code.

required

Returns:

Name Type Description
A ValidationResult

class:ValidationResult with the matching district and

ValidationResult

province populated when valid.

Raises:

Type Description
NotImplementedError

Postal-code validation has not been implemented yet.