New Arrival

We welcomed our newest family member in late July – JAJ, already a smart and expressive little girl.  She hasn’t seen much yet but Mom and Dad are counting the days til she can enjoy the local greenery and other attractions.  So far she has gotten a quick visit to a family farm stand – sneak preview of a much bigger world!

 

 

First On Call And It Was Fun

This week I went on call for the first time at Acquia and it was actually a lot of fun. As a Cloud Engineer, I’m on the escalation path for issues with our Amazon services – an example would be something goes wrong with an Amazon instance, I put in the ticket to Amazon customer support and handle its resolution.

What’s fantastic is that we have 24 hour support, but I am only on call during my work hours. Our support shifts and on call rotation “follow the sun.” I have colleagues on my team who work in Europe and Australia. In the morning (from my perspective) my European colleague passes the shift to me, and in the evening I pass the shift to my Australian colleague. Neat!

One of my buddies on the team is taking a vacation that happened to have an on call week right in the middle. I am enjoying it so much that I volunteered to take his shift. Looking forward to getting more exposure to the “hot seat” – so far so good!

Next Actions – Stand Against the Trump Regime

Now that Trump has declared his intention to act against the people of the United States, I reached out to a friend of mine who gave me a list of next actions she and her wife have already taken for fighting the regime.
– Attend the protests & marches
– Sign the petitions on http://front.moveon.org
– Join your local Indivisible chapter. These groups are pretty new and disorganized, so not a lot of unified action yet, but these are providing opportunities to fight the travel ban
– Become a monthly donor to the ACLU, Planned Parenthood, CAIR, MIRA, and others.
– Subscribe to protesting journalists and periodicals – NYT, Boston Globe, Christian Science Monitor
– Find your Congressional representatives https://www.senate.gov/general/contact_information/senators_cfm.cfm and send them daily calls/emails voicing your concern.
For Massachusetts these are:
Ask friends & family in other states to call their Republican representatives
– Commit to the 10 actions / 100 days campaign from the women’s march: https://www.womensmarch.com/100/.
From Chelsea:  My wife & I had a few friends over on Saturday and we made postcards together.
Clearly we are not artists 🙂 But it was a great way to bond while taking action.

​- Sign up for SwingLeft
– Donate to a local mosque and write them a letter expressing support.  Send care packages and letters to Muslim friends, neighbors, students.
– Get involved in our community and get more involved in local politics.   Have you ever thought about running for local office?
– For the spiritual minded, attend church, for support and as a way to become more involved with like-minded local people.
– Speak out on social media and stay tuned for news about protests and other events.
With all the liberties I have enjoyed throughout my lifetime, this is a chance to pay them forward for the next generation.

 

DevOps Insights from REDtalks 14

I recently had the good fortune to encounter Tom McGonagle, SE with F5, via the Boston DevOps chatroom, moderated by Dave Fredricks.  I had been invited to post in Dave’s newly inspired mentor/mentee topic channel, which I welcomed as I had been looking for guidance around a side project of mine.  Tom contacted me through chat, and before the morning was out, we were enjoying a crisp pair of pizzas, the artful pies you can only get in downtown.

We exchanged impressions on working in the tech industry, on the big-hearted, quirky and iconic culture that makes being an engineer among engineers so incredibly rewarding.  We concluded with an invite from Tom to one of the meetups he co-organizes, Hackernest in Artisan’s Asylum, so I marked my calendar and went on my way.

Before the week was out, Tom sent me a link to REDtalks #14: Tom & David on the Principles & Practices of DevOps with host Nathan Pearce, featuring Tom along with fellow DevOps specialist and Bentley U alumnus David Yates.

When I sat down to listen, I expected an informative piece with some new-to-me tidbits here and there.

This podcast captivates me.  Rather than listening passively from one end to the other, I found myself skipping back and forth to make sure I was getting exactly what is being said.  For Tom specifically as the one who reached out to me, congratulations – this is fantastic.

Here are my (extensive!) notes from this most excellent podcast.

Yates – 6:10DevOps Handbook by Gene Kim and the three ways

  1. continuous delivery – testing and QA as a first class object, how do you pull that left in the pipeline and do it early, often, iteratively and incrementally
  2. continuous intelligence – how do you pull it all into a central location and make sense of what was happening in your application and infrastructure
  3. continuous learning – “fail early and fail often”, don’t be afraid to take risks, you can only learn by practicing and getting better, experimentation as culture, that includes getting the components of the infrastructure to harmonize with each other

Yates – 11:30 – teams uniting around a common mission

  • Quarter over quarter, having a common goal as to how the team can get better.  One of those goals can be customer education.

OKRs – Google’s term, objectives and key results

McGonagle – 12:31 – CAMS

  • CAMS are culture, automation, monitoring, and sharing.  Sharing is critical as a devops engineer, devops consultant, or Devos SME at F5, there is a fiduciary responsibility to share these idea viruses.  One of the idea viruses that I’m hot on right now is the idea of agile networking, it’s my language around the application of agile and devops principles to the field of network engineering … it’s part and parcel of being part of the devops community, you have to share.  As part of my sharing, David and I organize the Boston area Jenkins Meetup group – largest area Jenkins meetup group in the world.  It’s part of getting out into the community and getting people aware and interested in DevOps.

McGonagle – 14:00 – 9 Practices of DevOps

Practice 1: 14:15 – Configuration Management – you can templatize your configurations and drive your autonomic infrastructures that self-build, self-configure and self-automate

  • Question from Yates on Practice #1: 16:20 – What are the best practices around Configuration management?
  • Answer about best practices from McGonagle at 16:40 –  use facts to drive your configuration, intelligence gathering about the server, self-identifying and self-configuring

Yates – 21:00 – the big motivators for devops is that it’s the marriage of modern management and IT best practices, positive feedback between business requirements and IT delivery

Yates – 21:31 – business reasons that gives DevOps legs

Yates – 21:45 – DevOps from all points of view, IT best practices

Practice 2: 22:59 – Continuous integration – a robot such as Jenkins that takes your code from a source code management repository and builds it and tests it in a continuous way, every time a developer commits code the robot tests it against the functional and unit tests, it enables the developers to have awareness of the quality of the code

  • McGonagle – 25:40 – Linting – check the code for the appropriate format, which eliminates an enormous amount of errors, a test that can be orchestrated through a tool like Jenkins

Practice 3: 26:40 Automated testing – TDD, test driven development, build the test into your CI infrastructure, “write the unit test before the code”

  • Yates – 27:53 – TDD is one of the core principles of the XP Agile framework, make sure you know it works before you roll it out, especially for security

Practice 4: 29:15 – Infrastructure as Code – software project for your infrastructure with all the benefits applied to infrastructure, infrastructure is programmable and extensible, saves time and validates the process

  • Yates – 34:14 – canary release – don’t put out a new release everywhere at once, put it out in an isolated deployment so it can be rolled back quickly, if it succeeds then roll it out more widely

Practice 5: 35:40 – Continuous delivery – the way the code is rolled out, there’s a button that’s pushed to release – do you push a button to release?

Practice 6: 35:40 – Continuous deployment – the code contantly goes to production – do you create a button to release?

Practice 7: 18:16 – Continuous monitoring – metrics driven devops, APM – application performance monitoring, instrumenting your code to expose various qualities about your code and infrastructure to a metrics gathering tool

  • McGonagle – 39:27 – ACAMS+ -> add in Agile to culture, automation, monitoring and sharing and what is important to you

Practice 8: 40:30 – Develop an engaged and inclusive culture to encourage collaboration and shared ownership

  • Tom’s Amish barn raising post , culture in which all teams are working toward the same goal
  • Yates – 41:44 – students run three sprints using scrum, the most important thing you can do is own the product you’re going to deliver, having empathy for teammates, easier to say than do

Practice 9: 43:47 – Actively participate in communities of practice to become a lifelong learner of technology development (don’t be a jerk!) – going to conferences, being a speaker, a good participant, a nice person, a listener, the benefit is the learning opportunities it creates

My final takeaway is I am humbled by the privilege of being able to work in an industry distinguished by a culture of enthusiasm, passion and ownership.

While no profession can be exempt from drudgery, the devops culture of cheerful collaboration has, by virtue of its effectiveness, become an accepted prerequisite for deploying a successful product.  As a result, the typical corporate cynicism is mitigated and even replaced by an expressive and generous optimism.  Innovative and disruptive indeed.

Resolving Hadoop Problems on Kerberized CDH 5.X

I ran into a problem in which I had a Kerberized CDH cluster and couldn’t run any hadoop commands from the command line, even with a valid Kerberos ticket.

So with a valid ticket, this would fail:
hadoop fs -ls /
WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

Here is what I learned and how I ended up resolving the problem. I have linked to Cloudera doc for the current version where possible, but some of the doc seems to be present only for older versions.

Please note that the problem comes down to a configuration issue but that Kerberos itself and Cloudera Manager were both installed correctly. Many of the problems I ran across while searching for answers came down to Kerberos or Hadoop being installed incorrectly. The problem I had occurred even though both Hadoop and Kerberos were functional, but they were not configured to work together properly.

TL;DR

MAKE SURE YOU HAVE A TICKET

Do a klist from the user you are trying to execute the hadoop command.

sudo su - myuser
klist

If you don’t have a ticket, it will print:

klist: Credentials cache file '/tmp/krb5cc_0' not found

If you try to do a hadoop command without a ticket you will get the GSS INITIATE FAILED error by design:
WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

In other words, that is not an install problem. If this is your situation, take a look at http://www.roguelynn.com/words/explain-like-im-5-kerberos/ . For other troubleshooting of Kerberos in general, check out https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/errors.html


CDH Default HDFS User and Group Restrictions

A default install of Cloudera has user and group restrictions on execution of hadoop commands, including a specific ban on certain users ( more on page 57 of http://www.cloudera.com/documentation/enterprise/5-6-x/PDF/cloudera-security.pdf ).
There are several properties that deal with this:

Specifically for user hdfs, make sure you have removed hdfs from the banned.users configuration property in hdfs-site.xml configuration if you are trying to use it to execute hadoop commands.

1) Unprivileged User and Write Permissions

The Cloudera-recommended way to execute Hadoop commands is to create an unprivileged user and matching principal, instead of using the hdfs user. A gotcha is that this user also needs its own /user directory and can run into write permissions errors with the /user directory. If your unprivileged user does not have a directory in /user, it may result in the WRITE permissions denied error.

Cloudera Knowledge Article

http://community.cloudera.com/t5/CDH-Manual-Installation/How-to-resolve-quot-Permission-denied-quot-errors-in-CDH/ta-p/36141

2) Datanode Ports and Data Directory Permissions
Another related issue is that Cloudera sets dfs.datanode.data.dir to 750 on a non-kerberized cluster, but requires 700 on a kerberized cluster. With the wrong dir permissions set, the Kerberos install will fail. The ports for the datanodes must also be set to values below 1024, which are recommended as 1006 for the HTTP port and 1004 for the Datanode port.

Datanode Directory

http://www.cloudera.com/documentation/enterprise/5-6-x/topics/cdh_ig_hdfs_cluster_deploy.html

Datanode Ports

http://www.cloudera.com/documentation/archive/manager/4-x/4-7-2/Configuring-Hadoop-Security-with-Cloudera-Manager/cmchs_enable_security_s9.html

3) Service Specific Configuration Tasks

On page 60 of the CDH security doc, there are steps to kerberize Hadoop services. Make sure you did these!

MapReduce

sudo -u hdfs hadoop fs -chown mapred:hadoop
${mapred.system.dir}

HBase

sudo -u hdfs hadoop fs -chown -R hbase ${hbase.rootdir}

Hive

sudo -u hdfs hadoop fs -chown hive /user/hive

YARN

rm -rf ${yarn.nodemanager.local-dirs}/usercache/*

All of these steps EXCEPT for the YARN one can happen at any time. The step for YARN must happen after Kerberos installation because what it is doing is removing the user cache for non-kerberized YARN data. When you run mapreduce after the Kerberos install it should populate this with the Kerberized user cache data.

YARN User Cache
http://stackoverflow.com/questions/29397509/yarn-application-exited-with-exitcode-1000-not-able-to-initialize-user-directo

Kerberos Principal Issues

1) Short Name Rules Mapping
Kerberos principals are “mapped” to the OS-level services users. For example, hdfs/WHATEVER@REALM maps to the service user ‘hdfs’ in your operating system only because of a name mapping rule set in the core-site of Hadoop. Without name mapping, Hadoop wouldn’t know which user is authenticated by which principal.

If you are using a principal that should map to hdfs, make sure the principal name resolves correctly to hdfs according to these Hadoop rules.

Good
(has a name mapping rule by default)

  • hdfs@REALM
  • hdfs/_HOST@REALM

Bad
(no name mapping rule by default)

  • hdfs-TAG@REALM

The “bad” example will not work unless you add a rule to accommodate it

Name Rules Mapping
http://www.cloudera.com/documentation/archive/cdh/4-x/4-5-0/CDH4-Security-Guide/cdh4sg_topic_19.html

2) Keytab and Principal Key Version Numbers Must Match
The Key Version Number (KVNO) is the version of the key that is actively being used (as if you had a house key but then changed the lock on the door so it used a new key, the old one is no longer any good). Both the keytab and principal have a KVNO and the version number must match.

By default, when you use ktadd or xst to export the principal to a keytab, it changes the keytab version number, but does not change the KVNO of the principal. So you can end up accidentally creating a mismatch.

Use -norandkey with kadmin or kadmin.local when exporting a principal to a keytab to avoid updating the keytab number and creating a KVNO mismatch.

In general, whenever having principal issues authentication issues, make sure to check that the KVNO of the principal and keytab match:
Principal
kadmin.local -q 'getprinc myprincipalname'

Keytab
klist -kte mykeytab

Creating Principals
http://www.cloudera.com/documentation/archive/cdh/4-x/4-3-0/CDH4-Security-Guide/cdh4sg_topic_3_4.html

Security Jars and JAVA Home

1) Java Version Mismatch with JCE Jars
Hadoop needs the Java security JCE Unlimited Strength jars installed in order to use AES-256 encryption with Kerberos. Both Hadoop and Kerberos need to have access to these jars. This is easy to miss because you can think you have the security jars installed when you really don’t.

JCE Configurations to Check

  • the jars are the right version – the correct security jars are bundled with Java, but if you install them after the fact you have to make sure the version of the jars corresponds to the version of Java or you will continue to get errors.
    To troubleshoot, check the md5sum hash of the JCE jars from a brand new download of the same exact JDK that you’re using against the md5sum hash of the ones on the Kerberos server.
  • the jars are in the right location ( JAVA_HOME/jre/lib/security )
  • Hadoop is configured to look for them in the right place. Check if there is an export statement for JAVA_HOME to the correct Java install location in /etc/hadoop/conf/hadoop-env.sh

If Hadoop has JAVA_HOME set incorrectly it will fail with GSS INITIATE FAILED. If the jars are not in the right location, Kerberos won’t find them and will give an error that it doesn’t support the AES-256 encryption type (UNSUPPORTED ENCTYPE)

Cloudera with JCE Jars
http://www.cloudera.com/documentation/enterprise/5-5-x/topics/cm_sg_s2_jce_policy.html

Troubleshooting JCE Jars
https://community.cloudera.com/t5/Cloudera-Manager-Installation/Problem-with-Kerberos-amp-user-hdfs/td-p/6809

Ticket Renewal with JDK 6 and MIT Kerberos 1.8.1 and Higher

Cloudera has an issue documented at http://www.cloudera.com/documentation/archive/cdh/3-x/3u6/CDH3-Security-Guide/cdh3sg_topic_14_2.html in which tickets must be renewed before hadoop commands can be issued. This only happens with Oracle JDK 6 Update 26 or earlier and package version 1.8.1 or higher of the MIT Kerberos distribution. To check the package, do an rpm -qa | grep krb5 on CentOS/RHEL or aptitude search krb5 -F "%c %p %d %V" on Debian/Ubuntu.

The workaround given by Cloudera is to do a regular kinit as you would, then do a kinit -R to force the ticket to be renewed.
kinit -kt mykeytab myprincipal
kinit -R

And finally, the issue I actually had which I could not find documented anywhere …

Configuration Files and Ticket Caching


There are two important configuration files for Kerberos, the krb5.conf and the kdc.conf. These are configurations for the krb5kdc service and the KDC database. My problem was the krb5.conf file had a property:
default_ccache_name = KEYRING:persistent:%{uid}

This set my cache name to KEYRING:persistent and user uid ( explained https://web.mit.edu/kerberos/krb5-1.13/doc/basic/ccache_def.html ). When I did a kinit, it created the ticket in /tmp because the cache name was being set elsewhere as /tmp. Cloudera services obtain authentication with files generated at runtime in /var/run/cloudera-scm-agent/process , and these all export the cache name environment variable ( KRB5CCNAME ) before doing their kinit. That’s why Cloudera could obtain tickets but my hadoop user couldn’t.

The solution was to remove the line from krb5.conf that set default_ccache_name and allow kinit to store credentials in /tmp , which is the MIT Kerberos default value DEFCCNAME ( documented at https://web.mit.edu/kerberos/krb5-1.13/doc/mitK5defaults.html#paths )

Liked this post and want to hear more? Follow me at https://twitter.com/saranicole and connect at https://www.linkedin.com/in/sarastreeter

Cloudera and Kerberos installation guides

Step-by-Step
https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_sg_intro_kerb.html
Advanced troubleshooting
http://www.cloudera.com/documentation/enterprise/5-6-x/PDF/cloudera-security.pdf , starting on page 48

Update on Minecraft on Digital Ocean

DevOps Day is this week! For the presentation I have a brand-y new Ansible playbook up on Github that lets anyone roll their very own Minecraft server on Digital Ocean.

Check it out at github.com/saranicole/stem-minecraft. You can also see how the playbook works in action on Asciinema at asciinema.org/a/2gojihwmv3k8urg2oujppe66q.

*Even Later Update:
Slides are posted at http://slideshare.net/saranicole1980/building-stem-with-minecraft
Gorgeous “Eleanor” Powerpoint Template available at http://www.slidescarnival.com/eleanor-free-presentation-template/308
Video: https://www.youtube.com/watch?v=FsfjWMs67DE
DevOps Days Boston Speaker profile https://www.devopsdays.org/events/2016-boston/program/sara-jarjoura/

Enjoy!

DevOps Engineering at Teradata

Teradata labs

In May I switched companies and tech fields – I am now working as a DevOps Engineer at Teradata.  During my last year at Axeda I had the opportunity to work on projects dealing with Amazon Web Services and EC2, and I became fascinated with the idea of managing infrastructure in code.  I am more on the “Dev” side than the “Ops” side of “DevOps”, as that is my background, however my experience with Linux at Free Geek Providence gives me an advantage when dealing with systems.

I have now completed my first project at Teradata for deploying Hadoop clusters.  It’s a Node.js app that fronts a set of Chef cookbooks to deploy any of four different distributions of Hadoop on a virtual machine cluster.  The problem that it solves is that Teradata has products that need to work well on various flavors and versions of Hadoop, and the people who test these combinations have a hard time creating the dev environments.  They would have less trouble with the better known vendors such as Hortonworks or Cloudera, but the less commonly used vendors such as IBM BigInsights or MapR would present challenges in setting up due to their unfamiliarity.

The app I created is similar in purpose to Cloudbreak, however Cloudbreak  (at the time of writing) is only intended for use with Ambari-based vendors such as Hortonworks or BigInsights.

The best part is that this first project is only the tip of the iceberg in learning about cutting edge big data technologies.  Next up I will be learning Openstack, which is an open source solution for creating an entire cloud infrastructure – think creating your own private Amazon EC2.  Openstack is on track to become increasingly important as companies look to take back control of their infrastructure.

Shameless plug:  my Boston-based team is hiring, check out our Teradata careers page and say hi to me on LinkedIn or Twitter .

Hackathon – Sara’s Rules of Thumb

Hackathons are short day or days-long events in which hackers prototype a technical solution to a business problem and then pitch and demo that solution to an audience. I’ve been a coach and organizer for three Hackathons and I’ve learned a few rules of thumb that make Hackathons more fun and valuable for the participants.

chuck-norris-thumbs-up

1) Make the device + sensors combination awesome

The target audience for the Hackathons we run are hardware hackers as well as IoT hackers, and if you get the right device – say the newest release of Arduino plus an a la carte selection of 50 sensors – their eyes light up and the creative juices start flowing. Pick the wrong device and you get to hang out with them on the couch wondering how to help them figure out what to do.

2) Code shoulder-to-shoulder with the hackers

Don’t show up and put your feet up on the table and say, “well I’m here so I’ve done my job!” No, get down and get dirty in the code, help out with code snippets and make things *work*. They’re here to turn an idea into reality and the best outcome is if the help of the coaches is available to make everyone successful. Let the best idea be implemented the best way it can and whaddya know cool things will happen.

3) Pick the right judges

Judges hold the power of the purse over the heads of the hackers. Prize money goes where the judges deem it worthy to go. For that reason you want judges who represent a few different perspectives – a business guy, a software gal, a hardware person, an investor – so the hackers have the best chance to impress at least one of them with their demo. Judges can give value to hackers simply by offering plenty of feedback, so make sure to select for experience and expertise in their area.

4) Help them with their demo

Putting a solution together is only part of the challenge of participating in a Hackathon, the other part is presenting it in an exciting and professional way. While working with hackers on the technical implementation, don’t forget to ask them if they have a slide or two on their project that explains its business value. Ask if they have a business-oriented team member and if they don’t, find them one.

Hackathons are fun and sometimes extreme events with coding going all through the day into the night and the morning. The experience alone makes it worth participating in one and who knows, you might be surprised at what you can put together!

9 Connected Hacks Covered By O’Reilly

The AT&T Hackathon in January was a blast, and my coverage of it as an M2M coach for the Arduino hardware gained some attention.

Check out the O’Reilly post at http://blog.makezine.com/2013/01/30/results-from-arduino-hackathon-at-atts-2013-developer-summit and the original at http://developer.axeda.com/community/blog/9-connected-hacks-rocked-mobile-app-space-2013