Gitlab - Argos ALM by PALO IT

feat: dict-mapper lib

transforms the structure of a dictionary based on a given set of transformation rules
parent 7887b3a4
# Created by https://www.toptal.com/developers/gitignore/api/python,visualstudiocode,pycharm
# Edit at https://www.toptal.com/developers/gitignore?templates=python,visualstudiocode,pycharm
### PyCharm ###
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
# User-specific stuff
.idea/**
# CMake
cmake-build-*/
# Mongo Explorer plugin
.idea/**/mongoSettings.xml
# File-based project format
*.iws
# IntelliJ
out/
# mpeltonen/sbt-idea plugin
.idea_modules/
# JIRA plugin
atlassian-ide-plugin.xml
# Cursive Clojure plugin
.idea/replstate.xml
# SonarLint plugin
.idea/sonarlint/
# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties
# Editor-based Rest Client
.idea/httpRequests
# Android studio 3.1+ serialized cache file
.idea/caches/build_file_checksums.ser
### PyCharm Patch ###
# Comment Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-215987721
# *.iml
# modules.xml
# .idea/misc.xml
# *.ipr
# Sonarlint plugin
# https://plugins.jetbrains.com/plugin/7973-sonarlint
.idea/**/sonarlint/
# SonarQube Plugin
# https://plugins.jetbrains.com/plugin/7238-sonarqube-community-plugin
.idea/**/sonarIssues.xml
# Markdown Navigator plugin
# https://plugins.jetbrains.com/plugin/7896-markdown-navigator-enhanced
.idea/**/markdown-navigator.xml
.idea/**/markdown-navigator-enh.xml
.idea/**/markdown-navigator/
# Cache file creation bug
# See https://youtrack.jetbrains.com/issue/JBR-2257
.idea/$CACHE_FILE$
# CodeStream plugin
# https://plugins.jetbrains.com/plugin/12206-codestream
.idea/codestream.xml
# Azure Toolkit for IntelliJ plugin
# https://plugins.jetbrains.com/plugin/8053-azure-toolkit-for-intellij
.idea/**/azureSettings.xml
### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
### Python Patch ###
# Poetry local configuration file - https://python-poetry.org/docs/configuration/#local-configuration
poetry.toml
# ruff
.ruff_cache/
# LSP config files
pyrightconfig.json
### VisualStudioCode ###
.vscode/
# Local History for Visual Studio Code
.history/
# Built Visual Studio Code Extensions
*.vsix
### VisualStudioCode Patch ###
# Ignore all local history of files
.history
.ionide
# End of https://www.toptal.com/developers/gitignore/api/python,visualstudiocode,pycharm
\ No newline at end of file
# dict-mapper # dict-mapper
Dict-Mapper library provides a flexible and powerful way to transform the structure of a dictionary based on a given set
of transformation rules. This can be particularly useful when you need to reshape data to fit a specific structure or
format.
## Features
The main function in this module is `map_dict(original:)`, which takes an original dictionary and a specification
dictionary as
inputs, and returns a new dictionary that has been transformed according to the specification.
The specification dictionary defines how the original dictionary should be transformed. Its keys are the keys of the new
dictionary, and its values can be:
- A string representing the key in the original dictionary.
- An instance of the 'skip' class to indicate that the key should be omitted.
- A function that takes the original dictionary and returns a value.
## Quick Start
````python
import uuid
from dict_mapper.mapping import map_dict, calculate_from_original, factory, skip
original = {"first_name": "Mac", "last_name": "Doe", "age": 30, "dob": "1990-01-01"}
specification = {
"id": factory(lambda: str(uuid.uuid4())),
"first_name": skip(),
"last_name": skip(),
"full_name": calculate_from_original(
lambda d: d["first_name"] + " " + d["last_name"]
),
}
result = map_dict(original, specification, keep_unmapped=True)
print(result)
"""
{
"age": 30,
"dob": "1990-01-01",
"id": "160281d4-a243-46b1-b3ac-23e7de7eb178",
"full_name": "Mac Doe",
}
"""
````
## Usage
```python
import uuid
from datetime import date, datetime
from dict_mapper.mapping import (
calculate_from_key,
calculate_from_original,
constant,
date_today,
datetime_now,
factory,
map_dict,
or_default,
with_type,
skip,
)
original = {
"first_name": "Mac",
"last_name": "Doe",
"age": 30,
"dob": "1990-01-01",
"address": {
"street": "127 Manuel Fernando de Soto",
"suburb": "Ciudad de México",
"postcode": "07469",
},
}
specification = {
"id": factory(lambda: str(uuid.uuid4())),
"status": constant("ACTIVE"),
"start_date": or_default("start_date", default=date.today()),
"created_at": datetime_now(),
"first_name": skip(),
"last_name": skip(),
"full_name": calculate_from_original(
lambda d: d["first_name"] + " " + d["last_name"]
),
"day_of_birth": with_type("dob", rtype=date),
"home_address": "address",
"home_address.suburb": skip(),
"home_address.country.code": skip(),
}
result = map_dict(original, specification, keep_unmapped=True)
print(result)
"""
{
"age": 30,
"dob": "1990-01-01",
"address": {
> "street": "127 Manuel Fernando de Soto",
"suburb": "Ciudad de México",
"postcode": "07469",
},
"id": "77e7e267-f3fa-4a94-9f2b-d1e98c135a8c",
"status": "ACTIVE",
"start_date": datetime.date(2024, 4, 11),
"created_at": datetime.datetime(2024, 4, 11, 20, 57, 35, 91853),
"full_name": "Mac Doe",
"day_of_birth": datetime.date(1990, 1, 1),
"home_address": {"street": "127 Manuel Fernando de Soto", "postcode": "07469"},
}
"""
```
In this example, the `map_dict()` function is used to create a new dictionary from the `original` dictionary.
The `specification` dictionary defines how the original dictionary should be transformed.
The `first_name` and `last_name` keys are removed from the result, a new `full_name` key is calculated from the original
dictionary, and the `address` key is renamed to `home_address` in the new dictionary. The `keep_unmapped` parameter is
set to `True`, so all keys in the original dictionary that are not in the specification are kept in the new dictionary.
The `key` names of the `specification` will be used as the keys in the new dictionary. The values of the `specification`
are used to know from where take the value for the new dictionary. The values can be a string representing the key in
the original dictionary, an instance of the `skip` class to indicate that the key should be omitted, or a function that
takes the original dictionary and returns a new value.
The library provides several helper functions that can be used in the specification dictionary:
- `constant(value)`: Assigns the given value to the given specification key.
- `or_default(key:str, default)`: Assigns the value of the key in the dictionary or the default value if the key is not
present.
- `calculate_from_original(func)`: Applies the given function to the original dictionary and assigns the result to the
given key.
- `calculate_from_key(key:str, transform, default=None)`: Applies the transform function to the value of the key in the
dictionary and assigns the result to the given key.
- `factory(func)`: Applies the function with no arguments and assigns the result to the given key. Useful for generating
unique IDs or initialize objects such as list or dict.
- `datetime_now()`: Assigns the current datetime in the given key.
- `date_today()`: Assigns the current date in the given key.
- `with_type(key:str: str, rtype: Type[Any], default=None, _format: str = None)`: Applies a str to bool,date,datetime
conversion to the value of the key in the dictionary to the given type.
\ No newline at end of file
This diff is collapsed.
[tool.poetry]
name = "dict-mapper"
version = "1.0.0"
description = "dict to dict mapping, allow transform the original structure given a specification"
authors = ["Miguel Galindo Rodriguez <mgalindo@palo-it.com>"]
readme = "README.md"
packages = [{include = "dict_mapper", from = "src"}]
[tool.poetry.dependencies]
python = "^3.11"
dictor = "^0.1.12"
[tool.poetry.group.dev.dependencies]
pytest = "^8.1.1"
black = "^24.3.0"
pytest-cov = "^5.0.0"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
[tool.pytest.ini_options]
testpaths = "tests"
filterwarnings = ["error", "ignore:The 'app' shortcut is now deprecated"]
log_cli = false
log_cli_level = "INFO"
log_cli_format = "%(asctime)s [%(levelname)8s] %(message)s (%(filename)s:%(lineno)s)"
log_cli_date_format = "%Y-%m-%d %H:%M:%S"
[tool.coverage.run]
branch = true
source = ['src/dict_mapper']
[tool.coverage.report]
skip_empty = true
exclude_also = ["def __repr__", "raise AssertionError", "raise NotImplementedError", "@(abc\\.)?abstractmethod", "pass"]
import copy
import logging
from datetime import datetime, date
from typing import Any, Type
from dictor import dictor
logger = logging.getLogger(__name__)
TRUE_STRINGS = ("yes", "true", "t", "1")
FALSE_STRINGS = ("no", "false", "f", "0")
skip = type("skip", (), {})
def constant(value):
return lambda _: value
def or_default(key, default):
return lambda d: dictor(d, key, default=default)
def calculate_from_original(func):
return lambda d: func(d)
def calculate_from_key(key, transform, default=None):
return lambda d: transform(dictor(d, key, default=default))
def factory(func):
return lambda _: func()
def datetime_now():
return lambda _: datetime.now()
def date_today():
return lambda _: date.today()
def convert_bool(value: str):
value_lower = value.lower()
if value_lower in TRUE_STRINGS:
return True
elif value_lower in FALSE_STRINGS:
return False
return None
def convert_date(value: str, _format: str) -> date:
if _format is None:
return datetime.fromisoformat(value).date()
else:
return datetime.strptime(value, _format).date()
def convert_datetime(value: str, _format: str) -> datetime:
if _format is None:
return datetime.fromisoformat(value)
else:
return datetime.strptime(value, _format)
def with_type(key: str, rtype: Type[Any], default=None, _format: str = None):
conversion_map = {
bool: convert_bool,
date: lambda value: convert_date(value, _format),
datetime: lambda value: convert_datetime(value, _format),
}
def convert(data: dict) -> Any:
value = dictor(data, key)
if value is None:
return default
elif rtype in conversion_map:
convert_func = conversion_map[rtype]
return convert_func(value)
else:
return rtype(value)
return convert
def __set_nested(obj: dict, path: str, value) -> None:
if "." in path:
*path, last = path.split(".")
for bit in path:
if bit not in obj:
obj[bit] = {}
obj = obj[bit]
obj[last] = value
else:
obj[path] = value
def __remove_nested(obj: dict, path: str) -> None:
if "." in path:
*path, last = path.split(".")
key_exist = True
for bit in path:
if bit not in obj:
key_exist = False
break
obj = obj[bit]
if key_exist:
obj.pop(last, None)
else:
logger.warning("Key not found: %s", path)
else:
obj.pop(path, None)
def map_dict(
original_dict: dict,
specification: dict,
keep_unmapped: bool = False,
) -> dict:
"""
Transforms an input dictionary according to a given specification.
Args:
original_dict (dict): The original dictionary to be transformed.
specification (dict): A dictionary specifying how the original dictionary should be transformed.
The keys are the keys of the new dictionary and the values can be:
- A string representing the key in the original dictionary.
- An instance of the 'skip' class to indicate that the key should be omitted.
- A function that takes the original dictionary and returns a value. Eg lambda original_dict: original_dict["key"] + "suffix"
The following functions are available for use in the specification:
- `constant(value)`: Assigns the given value to the given specification key.
- `or_default(key, default)`: Assigns the value of the key in the dictionary or the default value if the key is not present.
- `calculate_from_original(func)`: Applies the given function to the original dictionary and assigns the result to the given key.
- `calculate_from_key(key, transform, default=None)`: Applies the transform function to the value of the key in the dictionary and assigns the result to the given key.
- `factory(func)`: Applies the function with no arguments and assigns the result to the given key. Useful for generating unique IDs or initialize objects such as list or dict.
- `datetime_now()`: Assigns the current datetime in the given key.
- `date_today()`: Assigns the current date in the given key.
- `with_type(key: str, rtype: Type[Any], default=None, _format: str = None)`: Applies a str to bool,date,datetime convertion to the value of the key in the dictionary to the given type.
keep_unmapped (bool, optional): If True, the original dictionary will be kept in the new dictionary.
Defaults to False, which means only keys in specification will be present in the result.
Returns:
dict: A new dictionary that has been transformed according to the given specification.
"""
new_dict = {}
if keep_unmapped:
new_dict = copy.deepcopy(original_dict)
for key_target, key_origin in specification.items():
if isinstance(key_origin, str):
value = dictor(original_dict, key_origin)
__set_nested(new_dict, key_target, copy.deepcopy(value))
elif isinstance(key_origin, skip):
__remove_nested(new_dict, key_target)
elif callable(key_origin):
__set_nested(new_dict, key_target, key_origin(original_dict))
else:
logger.warning(
"Unexpected type for mapping key_origin:[%s], type:[%s]",
repr(key_origin),
type(key_origin),
)
return new_dict
import logging
import uuid
from datetime import date, datetime
import pytest
from dict_mapper.mapping import (
calculate_from_key,
calculate_from_original,
constant,
date_today,
datetime_now,
factory,
map_dict,
or_default,
with_type,
skip,
)
logger = logging.getLogger(__name__)
def get_id():
return "ID"
def calculate_complete_name(original_dict):
return original_dict["data"]["firstName"] + " " + original_dict["data"]["lastName"]
def generate_status(original_status):
if original_status is not None:
return original_status.upper()
else:
return "UNDEFINED"
def get_email(emailAddresses):
email = None
for emailAddress in emailAddresses:
if emailAddress["isPrimary"]:
email = emailAddress["email"]
break
return email
@pytest.fixture
def person_response():
return {
"data": {
"id": "f4fe41a5-5da4-480d-8f83-856b8cf5a138",
"displayName": "Abiran Natanael Salas Hernandez",
"firstName": "Abiran Natanael",
"lastName": "Salas Hernandez",
"dateOfBirth": "1991-10-07",
"gender": "Hombre",
"isSupervisor": False,
"isMarried": "false",
"employeeNumber": "096-MEX",
"employmentStatus": "Current Staff",
"addresses": [
{
"addressType": "Casa",
"fullAddress": "Manuel Fernando de Soto 127, Constitución de la República, Mexico City, CDMX",
"country": "Mexico",
"postcode": "07469",
"state": "Ciudad de México",
"street": "127 Manuel Fernando de Soto",
"suburb": "Ciudad de México",
"isPrimary": True,
"customFields": {},
"emailAddresses": [
{
"email": "asalas@palo-it.com",
"isPrimary": True,
"isPersonal": False,
"customFields": {},
},
{
"email": "abisaher@gmail.com",
"isPrimary": False,
"isPersonal": True,
"customFields": {},
},
],
}
],
"first_commit": "2022-11-11T22:38:14+00:00",
}
}
def test_map(person_response):
# given:
specification = {
"id": factory(get_id),
"name": "data.displayName",
"dob": with_type("data.dateOfBirth", rtype=date),
"gender_str": or_default(key="not_exist", default="Male"),
"is_supervisor": or_default(key="data.isSupervisor", default=True),
"complete_name": calculate_from_original(calculate_complete_name),
"employment_status": calculate_from_key(
key="data.employmentStatus",
default="Developer",
transform=generate_status,
),
"employment": calculate_from_key(
key="employment", default="Developer", transform=generate_status
),
"dojo": calculate_from_key(key="dojo", transform=generate_status),
"status": constant("ACTIVE"),
"first_commit": with_type("data.first_commit", rtype=datetime),
"created_at": datetime_now(),
"today": date_today(),
"isMarried": with_type(key="data.isMarried", rtype=bool),
"isHaveKids": with_type(key="data.not_exist", rtype=bool, default=False),
"isTodayBirthday": with_type(key="data.dateOfBirth", rtype=bool),
"not_exist": "not_exist",
}
# when:
result = map_dict(person_response, specification)
# then:
assert result is not None
assert result["id"] == "ID"
assert result["name"] == "Abiran Natanael Salas Hernandez"
assert result["dob"] == datetime.strptime("1991-10-07", "%Y-%m-%d").date()
assert result["gender_str"] == "Male"
assert result["is_supervisor"] == False
assert result["complete_name"] == "Abiran Natanael Salas Hernandez"
assert result["employment_status"] == "CURRENT STAFF"
assert result["employment"] == "DEVELOPER"
assert result["dojo"] == "UNDEFINED"
assert result["status"] == "ACTIVE"
assert result["first_commit"] == datetime.fromisoformat("2022-11-11T22:38:14+00:00")
assert result["created_at"].date() == datetime.today().date()
assert result["today"] == datetime.today().date()
assert result["isMarried"] is False
assert result["isHaveKids"] is False
assert result["isTodayBirthday"] is None
assert result["not_exist"] is None
def test_map_no_correct_key_type():
original = {"first_name": "Mac"}
specification = {"first_name": 5}
result = map_dict(original, specification)
assert len(result) == 0
def test_map_with_keep_unmapped():
original = {"first_name": "Mac", "last_name": "Doe", "age": 30, "dob": "1990-01-01"}
specification = {
"id": factory(lambda: str(uuid.uuid4())),
"first_name": skip(),
"last_name": skip(),
"full_name": calculate_from_original(
lambda d: d["first_name"] + " " + d["last_name"]
),
}
result = map_dict(original, specification, keep_unmapped=True)
print(result)
"""
{
"age": 30,
"dob": "1990-01-01",
"id": "160281d4-a243-46b1-b3ac-23e7de7eb178",
"full_name": "Mac Doe",
}
"""
assert result["first_name"] == "Mac"
assert result["last_name"] == "Doe"
assert result["age"] == 30
assert result["dob"] == "1990-01-01"
assert result["full_name"] == "Mac Doe"
def test_map_with_keep_unmapped_with_skip():
original = {
"first_name": "Mac",
"last_name": "Doe",
"age": 30,
"dob": "1990-01-01",
"address": {
"street": "127 Manuel Fernando de Soto",
"suburb": "Ciudad de México",
"postcode": "07469",
},
}
specification = {
"first_name": skip(),
"last_name": skip(),
"home_address": "address",
"home_address.suburb": skip(),
"home_address.country.code": skip(),
"full_name": calculate_from_original(
lambda d: d["first_name"] + " " + d["last_name"]
),
}
result = map_dict(original, specification, keep_unmapped=True)
assert result.get("first_name") is None
assert result.get("last_name") is None
assert result["age"] == 30
assert result["dob"] == "1990-01-01"
assert result["full_name"] == "Mac Doe"
assert result["full_name"] == "Mac Doe"
assert result["home_address"] is not None
assert result["home_address"]["postcode"] == "07469"
assert result["home_address"]["street"] == "127 Manuel Fernando de Soto"
assert "suburb" not in result["home_address"]
def test_map_with_nested_specs():
original = {"first_name": "Mac", "last_name": "Doe", "age": 30, "dob": "1990-01-01"}
specification = {"name.first_name": "first_name", "name.last_name": "last_name"}
result = map_dict(original, specification)
assert result["name"]["first_name"] == "Mac"
assert result["name"]["last_name"] == "Doe"
assert "age" not in result
assert "dob" not in result
def test_with_type_no_correct_type():
# Preparación de datos y función
error_converter = with_type(key="dob", rtype=date)
data = {"dob": "hello"}
# Ejecución y verificación
with pytest.raises(Exception):
error_converter(data)
@pytest.mark.parametrize(
"val,expected",
[
("true", True),
("True", True),
("t", True),
("1", True),
("yes", True),
("false", False),
("False", False),
("f", False),
("0", False),
("no", False),
],
)
def test_with_type_bool(val, expected):
# Preparación de datos y función
bool_converter = with_type(key="isSupervisor", rtype=bool)
data = {"isSupervisor": val}
# Ejecución y verificación
assert bool_converter(data) is expected
def test_with_type_int():
int_converter = with_type(key="age", rtype=int)
data = {"age": "30"}
assert int_converter(data) == 30
def test_with_type_float():
float_converter = with_type(key="rating", rtype=float)
data = {"rating": "4.5"}
assert float_converter(data) == 4.5
def test_with_type_date():
date_converter = with_type(key="dob", rtype=date, _format="%Y-%m-%d")
data = {"dob": "1990-01-01"}
assert date_converter(data) == date(1990, 1, 1)
def test_with_type_datetime():
datetime_converter = with_type(
key="last_update", rtype=datetime, _format="%Y-%m-%dT%H:%M:%S.%fZ"
)
data = {"last_update": "2020-05-20T15:30:00.000Z"}
assert datetime_converter(data) == datetime(2020, 5, 20, 15, 30)
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment