Skip to content

Commit

Permalink
Add openldap-etl script and instruction (#1647)
Browse files Browse the repository at this point in the history
* add openldap-etl scripts, README.md and docker compose file

* remove a misleading comment

* fixed a typo in README.md

Co-authored-by: Liangjun <[email protected]>
  • Loading branch information
liangjun-jiang and Liangjun authored Apr 24, 2020
1 parent c7851b7 commit ec5fbbc
Show file tree
Hide file tree
Showing 5 changed files with 364 additions and 0 deletions.
36 changes: 36 additions & 0 deletions contrib/metadata-ingestion/openldap-etl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# About this OpenLDAP ETL
The openldap-etl provides you ETL channel to communicate with an OpenLDAP server.

# OpenLDAP Docker Image
**Attention**
> The docker compose is for macOS environment. If you are running in a Linux environment, use the offical [osxia/docker-openldap](https://github.com/osixia/docker-openldap)
This docker compose file comes with a `OpenLDAP server` and `Php LDAP Admin` portal, and it is based on [this](https://gist.github.com/thomasdarimont/d22a616a74b45964106461efb948df9c) with modification.

# Start OpenLDAP and Php LDAP admin
```
docker-compose up
```
# Login via ldapadmin
Head to `localhost:7080` with your browser, enter the following credential to login
```
Login:cn=admin,dc=example,dc=org
Password:admin
```

# Seed Group, Users
Import `sample-ldif.txt` to come up with your organization from PhpLDAPAdmin portal.
`sample-ldif.txt` contains information about
* group: we set up a `people` group
* peoples under `people` group: here are `Simpons` family member under `people` group.

# Run ETL Script
Once we finish setting up our organization, we are about to run `openldap-etl.py` script.
In this script, we query a user by his given name: Homer, we also filter result attributes to a few. We also look for Homer's manager, if there is one.
This script is mostly based on `ldap-etl.py`. However, there is an important attribute `sAMAccountName` which is not exist in OpenLDAP. So we have to modify it a little bit.
Once we find Homer, we assemble his information and his manager's name to `corp_user_info`, as a message of `MetadataChangeEvent` topic, publish it.
After Run `pip install --user -r requirements.txt`, then run `python openldap-etl.py`, you are expected to see
```
{'auditHeader': None, 'proposedSnapshot': ('com.linkedin.pegasus2avro.metadata.snapshot.CorpUserSnapshot', {'urn': "urn:li:corpuser:'Homer Simpson'", 'aspects': [{'active': True, 'email': 'hsimpson', 'fullName': "'Homer Simpson'", 'firstName': "b'Homer", 'lastName': "Simpson'", 'departmentNumber': '1001', 'displayName': 'Homer Simpson', 'title': 'Mr. Everything', 'managerUrn': "urn:li:corpuser:'Bart Simpson'"}]}), 'proposedDelta': None} has been successfully produced!
```


53 changes: 53 additions & 0 deletions contrib/metadata-ingestion/openldap-etl/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
version: '2'
services:
openldap:
image: osixia/openldap:latest
container_name: openldap
domainname: "example.org"
hostname: "openldap"
environment:
LDAP_LOG_LEVEL: "256"
LDAP_ORGANISATION: "Example Inc."
LDAP_DOMAIN: "example.org"
LDAP_BASE_DN: ""
LDAP_ADMIN_PASSWORD: "admin"
LDAP_CONFIG_PASSWORD: "config"
LDAP_READONLY_USER: "false"
LDAP_READONLY_USER_USERNAME: "readonly"
LDAP_READONLY_USER_PASSWORD: "readonly"
LDAP_RFC2307BIS_SCHEMA: "false"
LDAP_BACKEND: "mdb"
LDAP_TLS: "true"
LDAP_TLS_CRT_FILENAME: "ldap.crt"
LDAP_TLS_KEY_FILENAME: "ldap.key"
LDAP_TLS_CA_CRT_FILENAME: "ca.crt"
LDAP_TLS_ENFORCE: "false"
LDAP_TLS_CIPHER_SUITE: "SECURE256:-VERS-SSL3.0"
LDAP_TLS_PROTOCOL_MIN: "3.1"
LDAP_TLS_VERIFY_CLIENT: "demand"
LDAP_REPLICATION: "false"
#LDAP_REPLICATION_CONFIG_SYNCPROV: "binddn="cn=admin,cn=config" bindmethod=simple credentials=$LDAP_CONFIG_PASSWORD searchbase="cn=config" type=refreshAndPersist retry="60 +" timeout=1 starttls=critical"
#LDAP_REPLICATION_DB_SYNCPROV: "binddn="cn=admin,$LDAP_BASE_DN" bindmethod=simple credentials=$LDAP_ADMIN_PASSWORD searchbase="$LDAP_BASE_DN" type=refreshAndPersist interval=00:00:00:10 retry="60 +" timeout=1 starttls=critical"
#docker-compose.ymlLDAP_REPLICATION_HOSTS: "#PYTHON2BASH:['ldap://ldap.example.org','ldap://ldap2.example.org']"
KEEP_EXISTING_CONFIG: "false"
LDAP_REMOVE_CONFIG_AFTER_SETUP: "true"
LDAP_SSL_HELPER_PREFIX: "ldap"
tty: true
stdin_open: true
volumes:
- /var/lib/ldap
- /etc/ldap/slapd.d
- /container/service/slapd/assets/certs/
ports:
- "389:389"
- "636:636"
phpldapadmin:
image: osixia/phpldapadmin:latest
container_name: phpldapadmin
environment:
PHPLDAPADMIN_LDAP_HOSTS: "openldap"
PHPLDAPADMIN_HTTPS: "false"
ports:
- "7080:80"
depends_on:
- openldap
164 changes: 164 additions & 0 deletions contrib/metadata-ingestion/openldap-etl/openldap-etl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
#! /usr/bin/python
import sys
import ldap
from ldap.controls import SimplePagedResultsControl
from distutils.version import LooseVersion

LDAP24API = LooseVersion(ldap.__version__) >= LooseVersion('2.4')

LDAPSERVER ='ldap://localhost'
BASEDN ='dc=example,dc=org'
LDAPUSER = 'cn=admin,dc=example,dc=org'
LDAPPASSWORD = 'admin'
PAGESIZE = 10
ATTRLIST = ['cn', 'title', 'mail', 'displayName', 'departmentNumber','manager']
SEARCHFILTER='givenname=Homer'

AVROLOADPATH = '../../metadata-events/mxe-schemas/src/renamed/avro/com/linkedin/mxe/MetadataChangeEvent.avsc'
KAFKATOPIC = 'MetadataChangeEvent'
BOOTSTRAP = 'localhost:9092'
SCHEMAREGISTRY = 'http://localhost:8081'

def create_controls(pagesize):
"""
Create an LDAP control with a page size of "pagesize".
"""
if LDAP24API:
return SimplePagedResultsControl(True, size=pagesize, cookie='')
else:
return SimplePagedResultsControl(ldap.LDAP_CONTROL_PAGE_OID, True,
(pagesize,''))

def get_pctrls(serverctrls):
"""
Lookup an LDAP paged control object from the returned controls.
"""
if LDAP24API:
return [c for c in serverctrls
if c.controlType == SimplePagedResultsControl.controlType]
else:
return [c for c in serverctrls
if c.controlType == ldap.LDAP_CONTROL_PAGE_OID]

def set_cookie(lc_object, pctrls, pagesize):
"""
Push latest cookie back into the page control.
"""
if LDAP24API:
cookie = pctrls[0].cookie
lc_object.cookie = cookie
return cookie
else:
est, cookie = pctrls[0].controlValue
lc_object.controlValue = (pagesize, cookie)
return cookie

def build_corp_user_mce(dn, attrs, manager_ldap):
"""
Create the MetadataChangeEvent via DN and return of attributes.
"""
ldap = str(attrs['displayName'][0])[1:]
full_name = ldap
first_mame = full_name.split(' ')[0]
last_name = full_name.split(' ')[-1]
email = str(attrs['mail'][0])[1:]
display_name = ldap if 'displayName' in attrs else None
department = str(attrs['departmentNumber'][0])[1:] if 'departmentNumber' in attrs else None
title = str(attrs['title'][0])[1:] if 'title' in attrs else None
manager_urn = ("urn:li:corpuser:" + str(manager_ldap)[1:]) if manager_ldap else None

corp_user_info = \
{"active":True, "email": email, "fullName": full_name, "firstName": first_mame, "lastName": last_name,
"departmentNumber": department, "displayName": display_name,"title": title, "managerUrn": manager_urn}
# sys.stdout.write('cor user info: %s\n' % corp_user_info)

mce = {"auditHeader": None, "proposedSnapshot":
("com.linkedin.pegasus2avro.metadata.snapshot.CorpUserSnapshot",{"urn": "urn:li:corpuser:" + ldap, "aspects": [corp_user_info]}),
"proposedDelta": None}

produce_corp_user_mce(mce)

def produce_corp_user_mce(mce):
"""
Produce MetadataChangeEvent records
"""
from confluent_kafka import avro
from confluent_kafka.avro import AvroProducer

conf = {'bootstrap.servers': BOOTSTRAP,
'schema.registry.url': SCHEMAREGISTRY}
record_schema = avro.load(AVROLOADPATH)
producer = AvroProducer(conf, default_value_schema=record_schema)

try:
producer.produce(topic=KAFKATOPIC, value=mce)
producer.poll(0)
sys.stdout.write('\n%s has been successfully produced!\n' % mce)
except ValueError as e:
sys.stdout.write('Message serialization failed %s' % e)
producer.flush()

ldap.set_option(ldap.OPT_X_TLS_REQUIRE_CERT, ldap.OPT_X_TLS_ALLOW)
ldap.set_option(ldap.OPT_REFERRALS, 0)

l = ldap.initialize(LDAPSERVER)
l.protocol_version = 3

try:
l.simple_bind_s(LDAPUSER, LDAPPASSWORD)
except ldap.LDAPError as e:
exit('LDAP bind failed: %s' % e)

lc = create_controls(PAGESIZE)

while True:
try:
msgid = l.search_ext(BASEDN, ldap.SCOPE_SUBTREE, SEARCHFILTER,
ATTRLIST, serverctrls=[lc])
sys.stdout.write('LDAP searched\n')
except ldap.LDAPError as e:
sys.stdout.write('LDAP search failed: %s' % e)
continue

try:
rtype, rdata, rmsgid, serverctrls = l.result3(msgid)
except ldap.LDAPError as e:
sys.stdout.write('Could not pull LDAP results: %s' % e)
continue

for dn, attrs in rdata:
sys.stdout.write('found attrs result: %s\n' % attrs)
if len(attrs) == 0 or 'mail' not in attrs \
or 'displayName' not in attrs \
or len(attrs['displayName']) == 0:
continue
manager_ldap = None
if 'manager' in attrs:
try:
manager = attrs['manager'][0]
manager_name = str(manager).split(',')[0][5:]
manager_search_filter = 'displayName=%s' % manager_name
manager_msgid = l.search_ext(BASEDN, ldap.SCOPE_SUBTREE,
manager_search_filter, serverctrls=[lc])
except ldap.LDAPError as e:
sys.stdout.write('manager LDAP search failed: %s' % e)
continue
try:

manager_ldap = l.result3(manager_msgid)[1][0][1]['displayName'][0]
except ldap.LDAPError as e:
sys.stdout.write('Could not pull managerLDAP results: %s' % e)
continue
build_corp_user_mce(dn, attrs, manager_ldap)

pctrls = get_pctrls(serverctrls)
if not pctrls:
print >> sys.stderr, 'Warning: Server ignores RFC 2696 control.'
break

cookie = set_cookie(lc, pctrls, PAGESIZE)
if not cookie:
break

l.unbind()
sys.exit(0)
2 changes: 2 additions & 0 deletions contrib/metadata-ingestion/openldap-etl/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
confluent-kafka[avro]==1.1.0
python-ldap==3.2.0
109 changes: 109 additions & 0 deletions contrib/metadata-ingestion/openldap-etl/sample-ldif.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# LDIF Export for dc=example,dc=org
# Server: openldap (openldap)
# Search Scope: sub
# Search Filter: (objectClass=*)
# Total Entries: 9
#
# Generated by phpLDAPadmin (http://phpldapadmin.sourceforge.net) on April 19, 2020 9:52 pm
# Version: 1.2.5

version: 1

# Entry 1: dc=example,dc=org
dn: dc=example,dc=org
dc: example
o: Example Inc.
objectclass: top
objectclass: dcObject
objectclass: organization

# Entry 2: cn=admin,dc=example,dc=org
dn: cn=admin,dc=example,dc=org
cn: admin
description: LDAP administrator
objectclass: simpleSecurityObject
objectclass: organizationalRole
userpassword: {SSHA}JtYNxQ0G8BA6trUuRJx29IH50ck4Ii11

# Entry 3: cn=simpons-group,dc=example,dc=org
dn: cn=simpons-group,dc=example,dc=org
cn: simpons-group
gidnumber: 500
objectclass: posixGroup
objectclass: top

# Entry 4: ou=people,dc=example,dc=org
dn: ou=people,dc=example,dc=org
description: All people in organisation
objectclass: organizationalUnit
objectclass: top
ou: people

# Entry 5: cn=Bart Simpson,ou=people,dc=example,dc=org
dn: cn=Bart Simpson,ou=people,dc=example,dc=org
cn: Bart Simpson
displayname: Bart Simpson
gidnumber: 500
givenname: Bart
homedirectory: /home/users/bsimpson
objectclass: inetOrgPerson
objectclass: posixAccount
objectclass: top
sn: Simpson
title: Mr. Boss
uid: bsimpson
uidnumber: 1000
userpassword: {MD5}4QrcOUm6Wau+VuBX8g+IPg==

# Entry 6: cn=Homer Simpson,ou=people,dc=example,dc=org
dn: cn=Homer Simpson,ou=people,dc=example,dc=org
cn: Homer Simpson
departmentnumber: 1001
displayname: Homer Simpson
gidnumber: 500
givenname: Homer
homedirectory: /home/users/hsimpson
mail: hsimpson
manager: cn=Bart Simpson,ou=people,dc=example,dc=org
objectclass: inetOrgPerson
objectclass: posixAccount
objectclass: top
sn: Simpson
title: Mr. Everything
uid: hsimpson
uidnumber: 1001
userpassword: {MD5}4QrcOUm6Wau+VuBX8g+IPg==

# Entry 7: cn=Lisa Simpson,ou=people,dc=example,dc=org
dn: cn=Lisa Simpson,ou=people,dc=example,dc=org
cn: Lisa Simpson
gidnumber: 500
givenname: Lisa
homedirectory: /home/users/lsimpson
objectclass: inetOrgPerson
objectclass: posixAccount
objectclass: top
sn: Simpson
uid: lsimpson
uidnumber: 1002
userpassword: {MD5}4QrcOUm6Wau+VuBX8g+IPg==

# Entry 8: cn=Maggie Simpson,ou=people,dc=example,dc=org
dn: cn=Maggie Simpson,ou=people,dc=example,dc=org
cn: Maggie Simpson
gidnumber: 500
givenname: Maggie
homedirectory: /home/users/msimpson
objectclass: inetOrgPerson
objectclass: posixAccount
objectclass: top
sn: Simpson
uid: msimpson
uidnumber: 1003
userpassword: {MD5}4QrcOUm6Wau+VuBX8g+IPg==

# Entry 9: ou=Sales Department,dc=example,dc=org
dn: ou=Sales Department,dc=example,dc=org
objectclass: organizationalUnit
objectclass: top
ou: Sales Department

0 comments on commit ec5fbbc

Please sign in to comment.