HPC & AI Systems Administrator
The PC & AI Systems Administrator has deep technical knowledge of the design and deployment of data centers and the associated subsystems. These can include expertise in data center layout, mechanical design systems, cooling, power delivery and other critical data center design expertise. The deliverables of the role may take the form of design of Intel's data centers, support for customers in designing their data centers or in the development of new products and technologies based on data center design expertise.
HPC Frontier Lab / CRT-DC runs the Intel High Performance Computing benchmarking cluster called Endeavour. Endeavour is our renown and largest cluster showcasing Intel Architecture supporting deals, development, performance optimization and so much more. We are System integrators of future platforms. We also host other clusters to support HPC, Cloud, Enterprise, and other clusters for Technology, Pathfinding, and Innovation.
The HPC & AI Systems Administrator will be responsible for:
- Providing support and maintenance of large cluster hardware and software for high availability, consistency, and optimized performance.
- Managing various operating systems.
- Supporting Hardware such as rack-mounted servers and workstations.
- supporting the latest Intel HPC data center technologies, including servers, fabric, storage.
- Utilizing their skills in the areas of cluster debugging, Linux scripting, cluster validation tests, server expansion, file system tests and benchmarks
- Serving as a consultant for all projects and customers of the CRT Datacenter, creating and improving methodologies used in the datacenter to enhance the performance, reliability and manageability of the CRT clusters.
The ideal candidate should exhibit the following behavioral skills:
- Relationship management
- Effective influencing
- Agile written and verbal communicator
Minimum qualifications are required to be initially considered for this position. Preferred qualifications are in addition to the minimum requirements and are considered a plus factor in identifying top candidates.
- Bachelor's degree in Computer Science, Computer Engineering or any other related field and 4+ years of experience OR Master's degree in Computer Science, Computer Engineering or any other related field and 3+ years of experience
- 3+ years of Linux experience supporting complex servers
- 3+ years of Experience installing and managing the Linux operating systems on a server
- 3+ years of Experience administering a Linux server for multiple users
- 3+ years of experience with managing at least several identical Linux servers in a cluster
- 1+ year of experience with the technical concepts, architecture, systems, development methods, and disciplines associated with the defined program, and utilizes knowledge to accelerate project completion.
- Programming in at least one of the following languages (C, Python or Bash)
- Experience managing cluster systems with 100+ nodes
- Experience with Gigabit Ethernet
- Experience with high performance interconnects, preferably Mellanox InfiniBand or Omni-Path
- Experience managing HPC clusters with discrete GPUs (Nvidia, AMD or Intel)
- Experience with containers (Singularity, Podman, Charliecloud, Docker, Kubernetes, others)
- Experience administering high performance cluster file systems (Lustre, GPFS, others)
- Experience with supporting AI frameworks (TensorFlow, others)
- Experience with Extreme or Cisco network hardware setup and configuration
- Experience with MPI libraries, preferably Intel MPI
- Experience writing HPC application
- Experience with containerization as it pertains to HPC / AI workloads
- Experience with virtualized networks
- Experience managing Cloud based cluster systems
- A+ certification or equivalent experience.
- RHCSA certification or equivalent
- CCENT certification of equivalent
- RHCE certification or equivalent
- CCNA certification or equivalent
Requirements listed would be obtained through a combination of industry relevant job experience, internship experiences and or schoolwork/classes/research.Inside this Business Group
Intel Architecture, Graphics, and Software (IAGS) brings Intel's technical strategy to life. We have embraced the new reality of competing at a product and solution level—not just a transistor one. We take pride in reshaping the status quo and thinking exponentially to achieve what's never been done before. We've also built a culture of continuous learning and persistent leadership that provides opportunities to practice until perfection and filter ambitious ideas into execution.
All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.