Engineers on the AI/ML team are engaged in building distributed systems for ML ops, data preparation, and scale-up of R&D prototypes into production-ready systems. Most of our applications are based on deep learning, with particular emphasis on NLP tasks using techniques such as transfer learning with BERT. The successful candidate will have a conceptual understanding of machine learning workflows, ideally including professional experience implementing these workflows. AI/ML forms a core part of Disco’s brand and vision, and this position provides an opportunity to implement technology that will transform the legal domain, with significant benefits for the broader society.
This position will also involve providing direct support to ML Scientists and Engineers by troubleshooting infrastructure and software issues within our highly distributed AI systems, improving hardware utilization, throughput and latency, hunting rare bugs observed in our systems, developing regression tests and telemetry to prevent bugs in the future, and implementing more robust software in response. This is not a direct production role, but we take pride in what we do and stand behind any software we deploy. While working with us you will employ processes, coding methodologies, architecture techniques, and testing strategies that improve robustness, maintainability, and reliability that will significantly strengthen your profile as a Software Architect or Machine Learning Engineer.
As a software engineer, you will work with ML Scientists to prepare the data they need and write software to help speed up their work. From initial exploratory research, through optimization all the way to production. You will work to reproduce any production issues within our sandbox and use debuggers to do root cause analysis. In the lab we look at solutions from a combined software and data perspective. We include a view of the data as the foundation of our solutions. As such, you will develop an understanding of data transformations, data distribution shifts, data clustering, anomaly detection and more.
What You'll Do
- Design, develop, debug, and deploy software to support AI/ML solutions in an R&D context and in backend production components
- Architect, develop, and improve ML Ops infrastructure in support of ML Scientists
- Harden research prototypes in preparation for production deployment
- Provide engineering support for top-level ML scientists to help them access and manipulate data and build research prototypes
- Advise ML scientists on the engineering implications and opportunities of their ideas
- Adhere to appropriate engineering standards of scalability and robustness both in research and production-ready code
- Participate in domain-driven design processes for ML Ops infrastructure, from requirements gathering to system design, implementation, and deployment
Who You Are
- 7-10+ years experience in software development and deployment
- 2+ years of experience in AWS
- Experience across the software development lifecycle, from design through development, deployment and maintenance
- Experience with data management systems, including relational databases, data frames, and distributed file systems
- Reasonable prior exposure to best practices in software development, and a commitment to implementing these practices appropriately in context
- Personal communication skills to facilitate the productive professional interaction with ML scientists and other engineers
- Conceptual knowledge of machine learning techniques and workflows
Even Better If You Have…
- Basic understanding of machine learning methods, especially deep learning for NLP
- Familiarity with a few 3rd party tools and libraries in the ML space, such as PyTorch, Pandas, Tensorboard, CometML, CUDA, etc.
- Continuous Integration/Continuous Deployment (CI/CD) and Infrastructure as Code (IaC)
- Cloud Provider – AWS: EC2, Lambda, Aurora, Redshift, DynamoDB, ECS, SQS, SNS, Kinesis, S3, CloudFront, CloudFormation, SageMaker, KMS, CodePipeline, etc.
- DSL-based Search: multiple large-scale Elasticsearch Clusters searched using our Disco Query Language (DQL)
- Event Bus: Kafka and Schema Registry
- 3rd Party Vendors: Redis, Auth0 for Cloud Identity Federation (SSO, SAML, etc)
- AI: MinHash, FastText, Word2Vec, Convolution Neural Nets, Algorithmia (Lambda with GPUs) for training, PyTorch, Recurrent Neural Networks, Latent Dirichlet Allocation for Topic Modeling, etc.
- Deployment: Terraform, Docker (via ECS), Consul for App Config, Service Discovery, Shared Secrets
- Visibility: ELK Stack for logging, Datadog, New Relic, Sentry.io
- Transport Mechanisms: Protobuf, Avro, HTTP Rest/JSON
- CI/CD: Jenkins, CodePipeline, GitHub, Artifactory
DISCO provides a cloud-native, artificial intelligence-powered legal solution that simplifies ediscovery, legal document review and case management for enterprises, law firms, legal services providers and governments. Our scalable, integrated solution enables legal departments to easily collect, process and review enterprise data that is relevant or potentially relevant to legal matters.
In 2020, 171 law firms in the 2020 AmLaw 200 used DISCO in the course of legal work on behalf of their clients. More than 800 enterprises, law firms, legal services providers and government organizations are DISCO customers.
Are you ready to revolutionize the practice of law? Join us!
Perks of DISCO
- Open, inclusive, and fun environment
- Benefits, including medical, dental and vision insurance, as well as 401(k) (EU coming soon)
- Competitive salary plus stock options
- Flexible PTO
- Opportunity to be a part of a company that is revolutionizing the legal industry
- Growth opportunities throughout the company
We are an equal opportunity employer and value diversity. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
Please note that DISCO has a mandatory COVID vaccination policy which requires all employees in the U.S. to be fully vaccinated, subject to applicable legal exemptions.