walls.corpus

By Nathan L. Walls

Articles tagged “engineering”

wget of mass destruction

David E. Sanger and Eric Schmitt, reporting for the New York Times, have published an article titled “Snowden Used Low-Cost Tool to Best N.S.A.”. I know they’re reporting for a general audience, but I believe the article does a disservice by allowing anonymous national security “officials” to put simple automation into scare quotes:

Using “web crawler” software designed to search, index and back up a website, Mr. Snowden “scraped data out of our systems” while he went about his day job, according to a senior intelligence official. “We do not believe this was an individual sitting at a machine and downloading this much material in sequence,” the official said. The process, he added, was “quite automated.”

The findings are striking because the N.S.A.’s mission includes protecting the nation’s most sensitive military and intelligence computer systems from cyberattacks, especially the sophisticated attacks that emanate from Russia and China. Mr. Snowden’s “insider attack,” by contrast, was hardly sophisticated and should have been easily detected, investigators found.

Automation gonna automate, I suppose. Given that we’ve seen this dance with Aaron Schwartz, Chelsea Manning and Edward Snowden, the national security-industrial complex has a disingenuously naïve view of automation tools, particularly around Schwartz at MIT and Snowden, suggesting there was a mix of luck and quite possibly something nefarious to all this automation. The New York Times should approach statements made by agency officials skeptically. This sort of programming is not hard. Moreover, no one has to work particularly hard to hide this. In fact, what might look to some like “hiding” would simply be polite engineering under a different lens.

One key is a not-at-all-advanced concept of throttling. Well-behaved web crawlers (also known as spiders) are respectful about how many requests they issue in a given amount of time. A lot of requests all at once will attract the very sort of attention unnamed officials seem beside themselves to acknowledge Snowden only barely called to himself.

First, lots of requests in a short amount of time shows up in log files as such and quickly becomes a pattern. Patterns attract attention. Assuming the NSA and it’s various contractors audit access logs (which itself is something I’d automate), spreading requests over time makes it less likely to arouse suspicion. Moreover, unless an audit is looking for a particular type of activity, that manual or automated audit will not care a whit about well-throttled crawler traffic, because it looks a lot like expected traffic. It’s “hiding” to the same degree someone of average height and dress is “hiding” as they walk on a Manhattan sidewalk.

Second, setting aside any activity logs, system activity monitors seem more likely to catch a misbehaving web crawler. System activity monitors look at how much work a machine is doing at a given time. Typical checks look at how busy the CPU is, how much RAM is in use, overall network activity, what processes (“programs”) are running and so on. Some servers have automated checks in place, some don’t. For sake of discussion, I assert the servers hosting the content Snowden accessed were monitored in such a fashion. Now, assume each server has a variable amount, but average band of activity. Unless what Snowden was doing with his web crawler caused one of these checks to go out-of-bounds, it was unlikely to attract attention. Normal activity gets ignored.

On to the alleged crawling software itself.

In interviews, officials declined to say which web crawler Mr. Snowden had used, or whether he had written some of the software himself. Officials said it functioned like Googlebot, a widely used web crawler that Google developed to find and index new pages on the web. What officials cannot explain is why the presence of such software in a highly classified system was not an obvious tip-off to unauthorized activity.

First, Snowden’s job was as a systems administrator. Systems administration and development jobs involve access to not in any way top secret technologies like *NIX servers which typically have a wide-array of built-in scripting languages (Perl and Python most likely, Ruby very possibly). Or, perhaps Snowden is a shell scripter. Bash will get the job done.

As software goes, a basic web crawler is not exceptionally hard. I assert if its written with tools likely already resident on any average server or *NIX-based laptop (e.g. Mac OS X, Linux, possibly Windows with PowerShell), there’s really nothing about one that would raise any particular suspicion. Effectively, the raw pieces of the web crawler were quite likely already present. Writing a text file to marshal these raw pieces together is unlikely to raise suspicion because a systems administrator or software developer already has scores of similar files laying around. There’s not a magic “web crawler” bit that flips and will alert anyone.

As a thought experiment, what happens if every machine is audited and new and modified files are flagged, logged and sent off somewhere for analysis? Probably nothing, because in a large working group, a lot of these files are going to look very similar to each other, have innocuous or cryptic names and it would be a nigh-impossible task to write meaningful software to determine what all of these new files are for and, if they’re programs, what they do. Surely, no one is going to look at each one of these files. It’d be soul-sucking work.

Put another way; hammers, screwdrivers, wrenches, pliers, saws, knives aren’t noteworthy tools in a tool box. A new hammer on a construction site is unlikely to raise any attention. Similarly, just as carpenters use jigs, painters use scaffolding and auto mechanics use impact wrenches, ramps and hydraulic lifts to make their jobs easier, faster, more consistent and less tedious, systems engineers and developers use scripts. Now, imagine a construction site or factory inspecting everyone’s tool bag and workspace constantly for anything “inappropriate”. It wouldn’t be terribly effective and it’d be a huge burden and expense on the actual work. Imagine your average TSA security line at the office park.

There’s also some question about the web crawler having Snowden’s credentials:

When inserted with Mr. Snowden’s passwords, the web crawler became especially powerful. Investigators determined he probably had also made use of the passwords of some colleagues or supervisors.

But he was also aided by a culture within the N.S.A., officials say, that “compartmented” relatively little information. As a result, a 29-year-old computer engineer, working from a World War II-era tunnel in Oahu and then from downtown Honolulu, had access to unencrypted files that dealt with information as varied as the bulk collection of domestic phone numbers and the intercepted communications of Chancellor Angela Merkel of Germany and dozens of other leaders.

Officials say web crawlers are almost never used on the N.S.A.’s internal systems, making it all the more inexplicable that the one used by Mr. Snowden did not set off alarms as it copied intelligence and military documents stored in the N.S.A.’s systems and linked through the agency’s internal equivalent of Wikipedia.

As noted above, there’s nothing particularly special about a web crawler versus any other manner of script. It’s easy to inform utilities like wget and curl about authentication parameters and keep login cookies. It’s also easy for such a web crawler to announce itself to the server it requests information from in any manner. There’s a convention around giving an identification string, as Google and Yahoo do for their web crawlers, but it’s just as easy to call a web crawler Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko or Internet Explorer 11. Add in polite engineering of not requesting every page the web crawler sees as soon as it processes each preceding page and it’s going to be far less obvious that traffic to a web server is coming from a script instead of a human clicking a link. There’s not necessarily anything nefarious going on.

If Snowden had access to all of these systems and accessing what sounds equivalent to a corporate intranet was not going to arouse suspicion, there’s little I can think about this conceptual web crawler that would tip the balance into being caught. If the NSA wasn’t going to catch Snowden doing all of the work himself, it’s no more likely they were going to catch an automated process he wrote.

I don’t find any part of this story surprising from a technical standpoint. What I do find somewhat distressing is that unnamed officials think this is special or conveys villainous status on Snowden. It doesn’t, just as it should not have with Aaron Schwartz. Said officials should actually know better and if they don’t, they need to find technical advisors who will correctly inform them.

I bring this all up because I would like for reporters on stories such as this to find an average systems administrator, security analyst or software engineer to talk to in order to provide perspective. The New York Times has an excellent digital staff with developers who could easily demonstrate what a similar script would look like and how it would work and look internally. Surely, a news organization that builds great interactive stories and is growing more comfortable in its own clothes online can use some agency and draw on some of the experience that’s helping to provide some of that comfort to call officials on bad, self-serving analysis like this.

Portfolios for software engineers

It’s entirely possible that you’re thrilled to bits with your current gig. You haven’t seen the need to update your résumé in months or years. So, I totally get it when I tell you to assemble a portfolio and you respond with, “I’m happy, I’m not going anywhere anytime soon.” Build a portfolio anyway.

In my last piece, I talked about avoiding career stagnation and working where your work is valued. Building and maintaining a portfolio is an insurance policy against stagnation and the “go bag” for your career.

Background

There’s plenty out there on programming interview questions and answers, so I’m bypassing that. Same with résumés and cover letters. I don’t see many candidates with a portfolio of past work, so that’s why I’m telling you to have one. You will automatically stand apart.

Recommending a candidate for interview or hire is a lot easier when I see how that candidate approaches problems. Onsite interviews covering design exercises and problem solving cover some of this. A take-home exercise might cover another. However, that’s still oftentimes a hint at direction than true comfort. The interview itself relies on candidates telling an interviewer a story (and make no mistake, I want to hear how you explain how you did something.) Still, four or five hours of conversation with a group of hiring managers or potential peers is barely enough time for them to reach a very solid opinion of you.

A portfolio brings an entire other dimension to bear. I get a feel how you write code completely outside the context of the interview process. I see how you think through solving a problem with design, how you model a domain. It’s a window into how you think.

What’s in your portfolio

There should be some manner of map to what’s in the portfolio. Let’s first stipulate this portfolio is online and publicly available. What I would like is a straight-ahead path to get there. Ideally, it’s a simple, direct URL. So, let’s also stipulate I’m looking at an index page that gives me a brief amount of information about you and pointers to more (a blog, Twitter, etc) and the main body of the portfolio itself. The number of items on the portfolio isn’t particularly important past there being more than one or two things there and not seeing absolutely everything you’ve ever created a repository for. Select for breadth or depth, but know what you’re deciding and why.

I would love to see a mix of complete projects, pull requests for other projects, bug reports and even some toy code. That’s strictly what I’d be most comfortable with. Vary as you like and believe you can articulate well. Choose what best demonstrates your goals and practices.

For complete projects, a README describing the project’s purpose, installation process, support and contribution practices are the first thing I’ll read. Before I read any code, I’ll try to install it. Then, I’ll probably dive into the code structure itself. Can I make sense of how the code is structured? Is it structured how I’d expect to see other code in the same language? From there, I’ll likely dive into individual files to get an understanding of classes, frameworks and so on in use. After that, I’m going to read your commit messages. How do you work with source control? I’m looking for commits that explain the why behind the what. I’m also going to take note of whether or not your project has documentation, how your code is commented or not and your code style. There’s no one right or wrong thing with these. I do like to see consistency, clarity and readability.

Pull requests are a little different. I presume there’s an originating issue I can look at to see how you interacted with the project in question, in either opening the issue or proposing a solution with the pull request. If the project initially turned down or requested changes in your pull request, how did you handle that?

Toy code is great to show how you might address a problem like FizzBuzz, practice code kata, try out different sorting algorithms or learn a new language, to name some possibilities. I get a sense of how you play to learn.

Now, here’s what I care less about, what language you write code in. Sure, I can map your experience to mine easier if I’m familiar with the language you work in and it’s one you’d be working with for the position you’re being considered for. At the same time, I care more about how I see you work on your project and how you organize it. I care more if I see experience in one language that I expect you can apply to a cousin.

Work vs. personal projects

Tricky, tricky. Generally speaking, what you write at your day job is work-for-hire and code you don’t own. Businesses are understandably touchy about having processes and methods around the core of a product out in the wild. Still, it’s nice to have something to share. So, get creative. Is there anything your shop has done that’s in support of your product, but isn’t actually your product? “Sell it” by open sourcing it. You’ll need your boss and fellow engineers to support making a case to open source a project or tool to the business. What is that case? Well, consider that lots of engineers are attracted to work at places that contribute back to the sorts of open source communities they build their products on. Engineers like you. There’s more to this than what I’ve covered. Do your research, be willing to support it vs. publishing abandonware. Still, it’s a way of liberating some code for a portfolio.

Short of open sourcing an entire tool, see if you can publish some samples. Ask. Be clear that you’re publishing when asking.

Failing that, isn’t there some itch you’ve been meaning to scratch with a project of your own? It doesn’t have to be a 25 model domain. Pick a small problem, solve it in a way you want to see solved and publish. Get feedback and iterate.

Parting thoughts

A portfolio is one piece of many that’ll be considered in a hiring decision. They’re put together and refined over months and years. New pieces are added and others removed as you learn and evolve over the course of a career. A portfolio separates you from other candidates. Again, not many people submit any code samples, let alone with any sort of considered structure. Your portfolio is a vehicle to tell a story about you and your career.

A good portfolio may not get you hired at any particular job. Instead, maintaining a portfolio is you always preparing for your next gig, even when you aren’t actively looking for it. That greatly increases the chances of landing a gig that’s worthy of you. Fortune favors the prepared.

Valued work

The saddest thing I’ve seen in the last few years of interviewing software engineering candidates is that a fair amount end up in career dead ends. They work for the wrong sort of company, they’re trying to get out and they’re struggling because they’re unprepared to be most anywhere else.

There are two major classes of employer I see this career anti-pattern in. First are places like banks or other large institutions where software is some manner of back office function. The second is the small “cowboy” shop where “there’s no time” for doing things the, “right way.”

What does the anti-pattern look like? There are a few tells:

First, the organization/boss can’t abide anything other than the briefest of manual testing. Unit testing, functional testing, an agile validation and verification effort or a waterfall pass through a separate QA group are all “too much” or unheard of or seen as unnecessary.

Second, the developer is solving the same problem, the same way year after year. Business runs in cycles. At the same time, if this August is the same as August last year and August two or more years ago, watch out. Quite likely, the business is ignoring or simply unaware of something that could be better.

Third, the developer works on their own or with an equally inexperienced coworker. But it’s really not pair programming or XP. It’s just two developers working in separate silos that don’t talk to each other. All of the other engineers may have been around the block a few times, some more than others, but no one has left the neighborhood in years. This ties back to the point above. If no one is seeing anything new, no one has anything new to bring to the team and everyone stagnates together.

Fourth, there are no consistent development practices like code review. Continuous integration is still magic to a lot of folks. Depressingly, source control that everyone on the team uses at all, let alone with a consistent approach, is far from universal, too. No project or prototype is too small for ‘git init’.

There are plenty of other smells, but those four are scary enough to start with.

Now, for the sake of discussion, let’s stipulate that you have a friend about whom you’re concerned because you recognize this friend’s job hits this group of smells and your friend is casting about for a new gig and not getting anywhere.

First, just because writing software for a company or a contract hits one or more of these anti-pattern points doesn’t make them bad people or a bad employer. This is orthogonal to all of that. Jobs are scarce, everyone’s circumstances are different and, as a profession, we’re really, really lucky. That said, what’s going to help keep your friend’s skills relevant? Let’s start with a harsh gate.

Avoid companies that don’t understand what it is your friend does, why they hired your friend and thereby, what your friend’s value to the organization is. A lot of times, your friend will need to talk to others, like their hiring manager, about whether or not the company understands what it is they need. If they don’t, your friend has two options. Try to educate them or find another job. The first option sometimes works, your friend shouldn’t be shy about trying it. Yet, if your friend is months into a gig and it still isn’t clear to everyone what what your friend’s value is, advise your friend to move on.

Avoid companies that don’t support good engineering practices. If this is your friend’s first engineering position, please understand your friend is at a disadvantage for being persuasive about good engineering practices. If your friend knows what good practices are and can’t convince the business to adopt them, your friend should move on.

The key for your friend is finding a gig that values that friend’s craft and the standards and practices of that craft. Software engineering is a diverse field, so it’s quite possible people in different areas of the field are going to have different definitions. But, I’ll assert the following:

  • Everyone, individually, needs to be the most passionate advocate for their own career
  • Everyone, individually, needs to stay abreast of what is going on in their chosen field
  • Everyone, individually, needs to practice the skills that will help get them to where they want to go
  • A company, or at worst, a hiring manager, should understand why they are hiring a developer and how that developer’s practices contribute to the value the developer brings to the company.

Further, developers understanding the following things have a definite advantage over developers who do not:

  • Pair programming
  • Continuous Integration
  • The Gang of Four Design Patterns (even if you aren’t working in Java or C)
  • Behavior or Test Driven Development
  • Software deployment methodologies

This is beyond just having knowledge of languages and algorithms. It’s knowing and understanding how our peer group works.

Now, your friend? What might you tell this friend, in terms of improving their situation? There are two books I recommend starting with to kickstart thinking through software development practices and career development. They are, The Pragmatic Programmer by Dave Thomas and Andy Hunt and The Passionate Programmer by Chad Fowler. They are my go-to “kickstart someone’s career” books.

Above all else, your friend has to value their work and their skills to match well with an employer who respects what they bring to the table and what supports those skills. The same goes for you, too.

← Previous Next →